magoshi25@interspeech_2025@ISCA

Total: 1

#1 Multi-lingual and Zero-Shot Speech Recognition by Incorporating Classification of Language-Independent Articulatory Features [PDF] [Copy] [Kimi1] [REL]

Authors: Ryo Magoshi, Shinsuke Sakai, Jaeyoung Lee, Tatsuya Kawahara

We address multi-lingual speech recognition including unknown or zero-shot languages based on the International Phonetic Alphabet (IPA) and articulatory features. Articulatory features are language-independent representations for IPA based on phonetic knowledge. In the previous studies, however, they were mostly limited to two dimensions of place of articulation and manner of articulation. Moreover, the classification of articulatory features were not well aligned with phone recognition. In this study, we adopt a comprehensive 24-dimensional vector representation, and propose a training method in which IPA tokens and their corresponding articulatory features are simultaneously predicted based on CTC alignment. Experiments are conducted by fine-tuning the wav2vec 2.0 XLS-R model over 22 languages, and the results demonstrated significant improvements on average as well as in zero-shot language settings.

Subject: INTERSPEECH.2025 - Language and Multimodal