shi24g@interspeech_2024@ISCA

Total: 1

#1 ML-SUPERB 2.0: Benchmarking Multilingual Speech Models Across Modeling Constraints, Languages, and Datasets [PDF] [Copy] [Kimi] [REL]

Authors: Jiatong Shi ; Shih-Heng Wang ; William Chen ; Martijn Bartelds ; Vanya Bannihatti Kumar ; Jinchuan Tian ; Xuankai Chang ; Dan Jurafsky ; Karen Livescu ; Hung-yi Lee ; Shinji Watanabe

ML-SUPERB evaluates self-supervised learning (SSL) models on the tasks of language identification and automatic speech recognition (ASR). This benchmark treats the models as feature extractors and uses a single shallow downstream model, which can be fine-tuned for a downstream task. However, real-world use cases may require different configurations. This paper presents ML-SUPERB 2.0, which is a new benchmark for evaluating pre-trained SSL and supervised speech models across downstream models, fine-tuning setups, and efficient model adaptation approaches. We find performance improvements over the setup of ML-SUPERB. However, performance depends on the downstream model design. Also, we find large performance differences between languages and datasets, suggesting the need for more targeted approaches to improve multilingual ASR performance.