suzuki06b@interspeech_2006@ISCA

Total: 1

#1 Unsupervised language model adaptation based on automatic text collection from WWW [PDF] [Copy] [Kimi]

Authors: Motoyuki Suzuki ; Yasutomo Kajiura ; Akinori Ito ; Shozo Makino

An n-gram trained by a general corpus gives high performance. However, it is well known that a topic-specialized n-gram gives higher performance than that of the general n-gram. In order to make a topic specialized n-gram, several adaptation methods were proposed. These methods use a given corpus corresponding to the target topic, or collect documents related to the topic from a database. If there is neither the given corpus nor the topic-related documents in the database, the general n-gram cannot be adapted to the topic-specialized n-gram. In this paper, a new unsupervised adaptation method is proposed. The method collects topic-related documents from the world wide web. Several query terms are extracted from recognized text, and collected web pages given by a search engine are used for adaptation. Experimental results showed the proposed method gave 7.2 points higher word accuracy than that given by the general n-gram.