Group-Aware Partial Model Merging for Children's Automatic Speech Recognition

#1 Group-Aware Partial Model Merging for Children's Automatic Speech Recognition [PDF¹] [Copy] [Kimi] [REL]

Authors: Thomas Rolland, Alberto Abad

Automatic Speech Recognition (ASR) for children remains challenging, primarily due to large acoustic variability and limited availability of training data. While supervised fine-tuning of adult pre-trained models has shown promise, it often fails to capture group-specific characteristics variations among children. To address this, we introduce GRoup-Aware PARtial model Merging (GRAPAM), a parameter-efficient approach that combines unsupervised clustering, partial fine-tuning, and model merging. Our approach adapts adult-pre-trained models to children by first grouping the children's data based on acoustic similarity. Each group is used to partially fine-tune an adult pre-trained model, and the resulting models are merged at the parameter level. Experiments conducted on the MyST children's speech corpus indicate that GRAPAM achieves a relative improvement of 6% of Word Error Rate (WER), using the same amount of data, outperforming full fine-tuning while training fewer parameters. These results highlight the promise of model merging as a scalable and effective strategy for children's ASR.

Subject: Audio and Speech Processing

Publish: 2025-11-28 11:35:22 UTC

2511.23098

#1 Group-Aware Partial Model Merging for Children's Automatic Speech Recognition [PDF1] [Copy] [Kimi] [REL]

#1 Group-Aware Partial Model Merging for Children's Automatic Speech Recognition [PDF¹] [Copy] [Kimi] [REL]