heidel07@interspeech_2007@ISCA

Total: 1

#1 Language model adaptation using latent dirichlet allocation and an efficient topic inference algorithm [PDF] [Copy] [Kimi] [REL]

Authors: Aaron Heidel, Hung-an Chang, Lin-shan Lee

We present an effort to perform topic mixture-based language model adaptation using latent Dirichlet allocation (LDA). We use probabilistic latent semantic analysis (PLSA) to automatically cluster a heterogeneous training corpus, and train an LDA model using the resultant topic-document assignments. Using this LDA model, we then construct topic-specific corpora at the utterance level for interpolation with a background language model during language model adaptation. We also present a novel iterative algorithm for LDA topic inference. Very encouraging results were obtained in preliminary experiments with broadcast news in Mandarin Chinese.

Subject: INTERSPEECH.2007 - Language and Multimodal