yeung19@interspeech_2019@ISCA

Total: 1

#1 A Frequency Normalization Technique for Kindergarten Speech Recognition Inspired by the Role of fo in Vowel Perception [PDF] [Copy] [Kimi1]

Authors: Gary Yeung ; Abeer Alwan

Accurate automatic speech recognition (ASR) of kindergarten speech is particularly important as this age group may benefit the most from voice-based educational tools. Due to the lack of young child speech data, kindergarten ASR systems often are trained using older child or adult speech. This study proposes a fundamental frequency (fo)-based normalization technique to reduce the spectral mismatch between kindergarten and older child speech. The technique is based on the tonotopic distances between formants and fo developed to model vowel perception. This proposed procedure only relies on the computation of median fo across an utterance. Tonotopic distances for vowel perception were reformulated as a linear relationship between formants and fo to provide an effective approach for frequency normalization. This reformulation was verified by examining the formants and fo of child vowel productions. A 208-word ASR experiment using older child speech for training and kindergarten speech for testing was performed to examine the effectiveness of the proposed technique against piecewise vocal tract length, F3-based, and subglottal resonance normalization techniques. Results suggest that the proposed technique either has performance advantages or requires the computation of fewer parameters.