2601.00557

Total: 1

#1 A Language-Agnostic Hierarchical LoRA-MoE Architecture for CTC-based Multilingual ASR [PDF4] [Copy] [Kimi5] [REL]

Authors: Yuang Zheng, Yuxiang Mei, Dongxing Xu, Jie Chen, Yanhua Long

Large-scale multilingual ASR (mASR) models such as Whisper achieve strong performance but incur high computational and latency costs, limiting their deployment on resource-constrained edge devices. In this study, we propose a lightweight and language-agnostic multilingual ASR system based on a CTC architecture with domain adaptation. Specifically, we introduce a Language-agnostic Hierarchical LoRA-MoE (HLoRA) framework integrated into an mHuBERT-CTC model, enabling end-to-end decoding via LID-posterior-driven LoRA routing. The hierarchical design consists of a multilingual shared LoRA for learning language-invariant acoustic representations and language-specific LoRA experts for modeling language-dependent characteristics. The proposed routing mechanism removes the need for prior language identity information or explicit language labels during inference, achieving true language-agnostic decoding. Experiments on MSR-86K and the MLC-SLM 2025 Challenge datasets demonstrate that HLoRA achieves competitive performance with state-of-the-art two-stage inference methods using only single-pass decoding, significantly improving decoding efficiency for low-resource mASR applications.

Subjects: Computation and Language , Sound , Audio and Speech Processing

Publish: 2026-01-02 04:08:39 UTC