2503.07522

Total: 1

#1 Building English ASR model with regional language support [PDF1] [Copy] [Kimi2] [REL]

Authors: Purvi Agrawal, Vikas Joshi, Bharati Patidar, Ankur Gupta, Rupesh Kumar Mehta

In this paper, we present a novel approach to developing an English Automatic Speech Recognition (ASR) system that can effectively handle Hindi queries, without compromising its performance on English. We propose a novel acoustic model (AM), referred to as SplitHead with Attention (SHA) model, features shared hidden layers across languages and language-specific projection layers combined via a self-attention mechanism. This mechanism estimates the weight for each language based on input data and weighs the corresponding language-specific projection layers accordingly. Additionally, we propose a language modeling approach that interpolates n-gram models from both English and transliterated Hindi text corpora. Our results demonstrate the effectiveness of our approach, with a 69.3% and 5.7% relative reduction in word error rate on Hindi and English test sets respectively when compared to a monolingual English model.

Subjects: Audio and Speech Processing , Computation and Language

Publish: 2025-03-10 16:48:51 UTC