bocchieri04@interspeech_2004@ISCA

Total: 1

#1 Methods for task adaptation of acoustic models with limited transcribed in-domain data [PDF] [Copy] [Kimi] [REL]

Authors: Enrico Bocchieri, Michael Riley, Murat Saraclar

Application specific acoustic models provide the best recognition accuracy, but they are expensive to train, because they require the transcription of large amount of in-domain speech. This paper focuses on the acoustic model estimation given limited in-domain transcribed speech data, and large amounts of transcribed out-of-domain data. First, we evaluate several combinations of known methods to optimize the adaptation/training of acoustic models on the limited in-domain speech data. Then, we propose Gaussian sharing to combine in-domain models with out-of-domain models, and a data generation process to simulate the presence of more speakers in the in-domain data. In a spoken language dialog application, we contrast our methods against an upper accuracy bound of 69.1% (model trained on many in-domain data) and a lower bound of 60.8% (no in-domain data). Using only 2 hours of in-domain speech for model estimation, we improve the accuracy by 5.1% (to 65.9%) over the lower bound.

Subject: INTERSPEECH.2004 - Speech Recognition