Unsupervised Rhythm and Voice Conversion to Improve ASR on Dysarthric Speech

#1 Unsupervised Rhythm and Voice Conversion to Improve ASR on Dysarthric Speech [PDF] [Copy] [Kimi¹] [REL]

Authors: Karl El Hajal, Enno Hermann, Sevada Hovsepyan, Mathew Magimai. -Doss

Automatic speech recognition (ASR) systems struggle with dysarthric speech due to high inter-speaker variability and slow speaking rates. To address this, we explore dysarthric-to-healthy speech conversion for improved ASR performance. Our approach extends the Rhythm and Voice (RnV) conversion framework by introducing a syllable-based rhythm modeling method suited for dysarthric speech. We assess its impact on ASR by training LF-MMI models and fine-tuning Whisper on converted speech. Experiments on the Torgo corpus reveal that LF-MMI achieves significant word error rate reductions, especially for more severe cases of dysarthria, while fine-tuning Whisper on converted data has minimal effect on its performance. These results highlight the potential of unsupervised rhythm and voice conversion for dysarthric ASR. Code available at: https://github.com/idiap/RnV

Subjects: Audio and Speech Processing , Artificial Intelligence , Machine Learning , Sound

Publish: 2025-06-02 12:57:36 UTC

2506.01618

#1 Unsupervised Rhythm and Voice Conversion to Improve ASR on Dysarthric Speech [PDF] [Copy] [Kimi1] [REL]

#1 Unsupervised Rhythm and Voice Conversion to Improve ASR on Dysarthric Speech [PDF] [Copy] [Kimi¹] [REL]