mrva06@interspeech_2006@ISCA

Total: 1

#1 Unsupervised language model adaptation for Mandarin broadcast conversation transcription [PDF] [Copy] [Kimi] [REL]

Authors: David Mrva, Philip C. Woodland

This paper investigates unsupervised language model adaptation on a new task of Mandarin broadcast conversation transcription. It was found that N-gram adaptation yields 1.1% absolute character error rate gain and continuous space language model adaptation done with PLSA and LDA brings 1.3% absolute gain. Moreover, using broadcast news language model alone trained on large data under-performs a model that includes additional small amount of broadcast conversations by 1.8% absolute character error rate. Although, broadcast news and broadcast conversation tasks are related, this result shows their large mismatch. In addition, it was found that it is possible to do a reliable detection of broadcast news and broadcast conversation data with the N-gram adaptation.

Subject: INTERSPEECH.2006 - Analysis and Assessment