On the jointly unsupervised feature vector normalization and acoustic model compensation for robust speech recognition

#1 On the jointly unsupervised feature vector normalization and acoustic model compensation for robust speech recognition [PDF] [Copy] [Kimi] [REL]

Authors: Luis Buera, Antonio Miguel, Eduardo Lleida, Óscar Saz, Alfonso Ortega

To compensate the mismatch between training and testing conditions, an unsupervised hybrid compensation technique is proposed. It combines Multi-Environment Model based LInear Normalization (MEMLIN) with a novel acoustic model adaptation method based on rotation transformations. A set of rotation transformations is estimated between clean and MEMLIN-normalized data by linear regression in a training process. Thus, each MEMLIN-normalized frame is decoded using the expanded acoustic models, which are obtained from the reference ones and the set of rotation transformations. During the search algorithm, one of the rotation transformations is on-line selected for each frame according to the ML criterion in a modified Viterbi algorithm. Some experiments with Spanish SpeechDat Car database were carried out. MEMLIN over standard ETSI front-end parameters reaches 75.53% of mean improvement in WER, while the introduced hybrid solution goes up to 90.54%.

Subject: INTERSPEECH.2007 - Speech Recognition

buera07@interspeech_2007@ISCA

#1 On the jointly unsupervised feature vector normalization and acoustic model compensation for robust speech recognition [PDF] [Copy] [Kimi] [REL]