Exploring the Impact of Pretrained Models and Web-Scraped Data for the 2022 NIST Language Recognition Evaluation

#1 Exploring the Impact of Pretrained Models and Web-Scraped Data for the 2022 NIST Language Recognition Evaluation [PDF] [Copy] [Kimi] [REL]

Authors: Tanel Alumäe, Kunnar Kukk, Viet-Bac Le, Claude Barras, Abdel Messaoudi, Waad Ben Kheder

This paper describes Vocapia-TalTech team systems developed for the 2022 NIST Language Recognition Evaluation (LRE22) which focused on spoken language identication of African languages, including low-resource languages. In both fixed and open conditions, our primary systems were fused from multiple individual systems using logistic regression. In the fixed condition, we largely relied on wav2vec2.0 conformer models pretrained on the provided training data. In the open condition, we used external pretrained wav2vec2.0 models, phonotactic models and features derived from a multilingual speech recognition system, and also augmented the provided target language development data with additional data scraped from the web. On the LRE22 evaluation data, our final fixed and open condition systems obtained excellent results, with primary metric Cact values of 0.111 and 0.067, respectively. A post-evaluation study shows that both pretrained models as well as additional data are important for accurate models.

Subject: INTERSPEECH.2023 - Language and Multimodal

alumae23@interspeech_2023@ISCA

#1 Exploring the Impact of Pretrained Models and Web-Scraped Data for the 2022 NIST Language Recognition Evaluation [PDF] [Copy] [Kimi] [REL]