KidSpeak: A General Multi-purpose LLM for Kids' Speech Recognition and Screening

#1 KidSpeak: A General Multi-purpose LLM for Kids' Speech Recognition and Screening [PDF¹] [Copy] [Kimi] [REL]

Authors: Rohan Sharma, Dancheng Liu, Jingchen Sun, Shijie Zhou, Jiayu Qin, Jinjun Xiong, Changyou Chen

With the rapid advancement of conversational and diffusion-based AI, there is a growing adoption of AI in educational services, ranging from grading and assessment tools to personalized learning systems that provide targeted support for students. However, this adaptability has yet to fully extend to the domain of children's speech, where existing models often fail due to their reliance on datasets designed for clear, articulate adult speech. Children, particularly those in early developmental stages or with speech and language pathologies, present unique challenges that current AI models and datasets are ill-equipped to handle. To address this, we introduce KidSpeak, a multi-task speech-enhanced Foundation Model capable of both generative and discriminative tasks specifically tailored to children's speech patterns. Our framework employs a two-stage training process that incorporates phonetic knowledge into the speech encoder, achieving an average accuracy of 87% across four separate tasks. Furthermore, recognizing the limitations of scalable human annotation and existing speech alignment tools, we propose the Flexible and Automatic Speech Aligner (FASA) and leverage the method to construct high quality datasets for training and evaluation. This novel alignment tool significantly improves the quality of aligned children's speech from noisy data, enhancing data quality by 13.6x compared to human annotations, as demonstrated on the CHILDES dataset. To the best of our knowledge, KidSpeak and FASA represent the first comprehensive solution designed for speech and language therapy in children, offering both a multi-purpose speech LLM and a robust alignment tool.

Subjects: Audio and Speech Processing , Artificial Intelligence , Computation and Language , Sound

Publish: 2025-12-01 00:19:37 UTC

2512.05994

#1 KidSpeak: A General Multi-purpose LLM for Kids' Speech Recognition and Screening [PDF1] [Copy] [Kimi] [REL]

#1 KidSpeak: A General Multi-purpose LLM for Kids' Speech Recognition and Screening [PDF¹] [Copy] [Kimi] [REL]