Are ASR foundation models generalized enough to capture features of regional dialects for low-resource languages?

#1 Are ASR foundation models generalized enough to capture features of regional dialects for low-resource languages? [PDF¹] [Copy] [Kimi] [REL]

Authors: Tawsif Tashwar Dipto, Azmol Hossain, Rubayet Sabbir Faruque, Md. Rezuwan Hassan, Kanij Fatema, Tanmoy Shome, Ruwad Naswan, Md. Foriduzzaman Zihad, Mohaymen Ul Anam, Nazia Tasnim, Hasan Mahmud, Md Kamrul Hasan, Md. Mehedi Hasan Shawon, Farig Sadeque, Tahsin Reasat

Conventional research on speech recognition modeling relies on the canonical form for most low-resource languages while automatic speech recognition (ASR) for regional dialects is treated as a fine-tuning task. To investigate the effects of dialectal variations on ASR we develop a 78-hour annotated Bengali Speech-to-Text (STT) corpus named Ben-10. Investigation from linguistic and data-driven perspectives shows that speech foundation models struggle heavily in regional dialect ASR, both in zero-shot and fine-tuned settings. We observe that all deep learning methods struggle to model speech data under dialectal variations but dialect specific model training alleviates the issue. Our dataset also serves as a out of-distribution (OOD) resource for ASR modeling under constrained resources in ASR algorithms. The dataset and code developed for this project are publicly available

Subject: Computation and Language

Publish: 2025-10-27 12:14:52 UTC

2510.23252

#1 Are ASR foundation models generalized enough to capture features of regional dialects for low-resource languages? [PDF1] [Copy] [Kimi] [REL]

#1 Are ASR foundation models generalized enough to capture features of regional dialects for low-resource languages? [PDF¹] [Copy] [Kimi] [REL]