2601.19451

Total: 1

#1 Dynamic Multi-Expert Projectors with Stabilized Routing for Multilingual Speech Recognition [PDF1] [Copy] [Kimi] [REL]

Authors: Isha Pandey, Ashish Mittal, Vartul Bahuguna, Ganesh Ramakrishnan

Recent advances in LLM-based ASR connect frozen speech encoders with Large Language Models (LLMs) via lightweight projectors. While effective in monolingual settings, a single projector struggles to capture the diverse acoustic-to-semantic mappings required for multilingual ASR. To address this, we propose SMEAR-MoE, a stabilized Mixture-of-Experts projector that ensures dense gradient flow to all experts, preventing expert collapse while enabling cross-lingual sharing. We systematically compare monolithic, static multi-projector, and dynamic MoE designs across four Indic languages (Hindi, Marathi, Tamil, Telugu). Our SMEAR-MoE achieves strong performance, delivering upto a 7.6% relative WER reduction over the single-projector baseline, while maintaining comparable runtime efficiency. Analysis of expert routing further shows linguistically meaningful specialization, with related languages sharing experts. These results demonstrate that stable multi-expert projectors are key to scalable and robust multilingual ASR.

Subject: Computation and Language

Publish: 2026-01-27 10:37:03 UTC