2509.15516

Total: 1

#1 State-of-the-Art Dysarthric Speech Recognition with MetaICL for on-the-fly Personalization [PDF] [Copy] [Kimi] [REL]

Authors: Dhruuv Agarwal, Harry Zhang, Yang Yu, Quan Wang

Personalizing Automatic Speech Recognition (ASR) for dysarthric speech is crucial but challenging due to training and storing of individual user adapters. We propose a hybrid meta-training method for a single model, excelling in zero-shot and few-shot on-the-fly personalization via in-context learning (ICL). Measuring Word Error Rate (WER) on state-of-the-art subsets, the model achieves 13.9% WER on Euphonia which surpasses speaker-independent baselines (17.5% WER) and rivals user-specific personalized models. On SAP Test 1, its 5.3% WER significantly bests the 8% from even personalized adapters. We also demonstrate the importance of example curation, where an oracle text-similarity method shows 5 curated examples can achieve performance similar to 19 randomly selected ones, highlighting a key area for future efficiency gains. Finally, we conduct data ablations to measure the data efficiency of this approach. This work presents a practical, scalable, and personalized solution.

Subjects: Audio and Speech Processing , Sound

Publish: 2025-09-19 01:40:57 UTC