SAML: Speaker Adaptive Mixture of LoRA Experts for End-to-End ASR

#1 SAML: Speaker Adaptive Mixture of LoRA Experts for End-to-End ASR [PDF²] [Copy] [Kimi⁵] [REL]

Authors: Qiuming Zhao, Guangzhi Sun, Chao Zhang, Mingxing Xu, Thomas Fang Zheng

Mixture-of-experts (MoE) models have achieved excellent results in many tasks. However, conventional MoE models are often very large, making them challenging to deploy on resource-constrained edge devices. In this paper, we propose a novel speaker adaptive mixture of LoRA experts (SAML) approach, which uses low-rank adaptation (LoRA) modules as experts to reduce the number of trainable parameters in MoE. Specifically, SAML is applied to the quantised and personalised end-to-end automatic speech recognition models, which combines test-time speaker adaptation to improve the performance of heavily compressed models in speaker-specific scenarios. Experiments have been performed on the LibriSpeech and the TED-LIUM 3 corpora. Remarkably, with a 7x reduction in model size, 29.1% and 31.1% relative word error rate reductions were achieved on the quantised Whisper model and Conformer-based attention-based encoder-decoder ASR model respectively, comparing to the original full precision models.

Subjects: Sound , Audio and Speech Processing

Publish: 2024-06-28 07:37:34 UTC

2406.19706

#1 SAML: Speaker Adaptive Mixture of LoRA Experts for End-to-End ASR [PDF2] [Copy] [Kimi5] [REL]

#1 SAML: Speaker Adaptive Mixture of LoRA Experts for End-to-End ASR [PDF²] [Copy] [Kimi⁵] [REL]