bocchieri04@interspeech_2004@ISCA

Total: 1

#1 Methods for task adaptation of acoustic models with limited transcribed in-domain data [PDF] [Copy] [Kimi]

Authors: Enrico Bocchieri ; Michael Riley ; Murat Saraclar

Application specific acoustic models provide the best recognition accuracy, but they are expensive to train, because they require the transcription of large amount of in-domain speech. This paper focuses on the acoustic model estimation given limited in-domain transcribed speech data, and large amounts of transcribed out-of-domain data. First, we evaluate several combinations of known methods to optimize the adaptation/training of acoustic models on the limited in-domain speech data. Then, we propose Gaussian sharing to combine in-domain models with out-of-domain models, and a data generation process to simulate the presence of more speakers in the in-domain data. In a spoken language dialog application, we contrast our methods against an upper accuracy bound of 69.1% (model trained on many in-domain data) and a lower bound of 60.8% (no in-domain data). Using only 2 hours of in-domain speech for model estimation, we improve the accuracy by 5.1% (to 65.9%) over the lower bound.