maskey04@interspeech_2004@ISCA

Total: 1

#1 Boostrapping phonetic lexicons for new languages [PDF] [Copy] [Kimi]

Authors: Sameer Maskey ; Alan Black ; Laura Tomokiya

Although phonetic lexicons are critical for many speech applications, the process of building one for a new language can take a significant amount of time and effort. We present a bootstrapping algorithm to build phonetic lexicons for new languages. Our method relies on a large amount of unlabeled text, a small set of 'seed words' with their phonetic transcription, and the proficiency of a native speaker in correctly inspecting the generated pronunciations of the words. The method proceeds by automatically building Letter-to-Sound (LTS) rules from a small set of the most commonly occurring words in a large corpus of a given language. These LTS rules are retrained as new words are added to the lexicon in an Active Learning step. This procedure is repeated until we have a lexicon that can predict the pronunciation of any word in the target language with the accuracy desired. We tested our approach for three languages: English, German and Nepali.