allauzen05@interspeech_2005@ISCA

Total: 1

#1 Diachronic vocabulary adaptation for broadcast news transcription [PDF] [Copy] [Kimi]

Authors: Alexandre Allauzen ; Jean-Luc Gauvain

This article investigates the use of Internet news sources to automatically adapt the vocabulary of a French and an English broadcast news transcription system. A specific method is developed to gather training, development and test corpora from selected websites, normalizing them for further use. A vectorial vocabulary adaptation algorithm is described which interpolates word frequencies estimated on adaptation corpora to directly maximize lexical coverage on a development corpus. To test the generality of this approach, experiments were carried out simultaneously in French and in English (UK) on a daily basis for the month May 2004. In both languages, the OOV rate is reduced by more than a half.