illium21@interspeech_2021@ISCA

Total: 1

#1 Visual Transformers for Primates Classification and Covid Detection [PDF] [Copy] [Kimi1]

Authors: Steffen Illium ; Robert Müller ; Andreas Sedlmeier ; Claudia-Linnhoff Popien

We apply the vision transformer, a deep machine learning model build around the attention mechanism, on mel-spectrogram representations of raw audio recordings. When adding mel-based data augmentation techniques and sample-weighting, we achieve comparable performance on both (PRS and CCS challenge) tasks of ComParE21, outperforming most single model baselines. We further introduce overlapping vertical patching and evaluate the influence of parameter configurations.