Context-dependent MLPs for LVCSR: TANDEM, hybrid or both?

#1 Context-dependent MLPs for LVCSR: TANDEM, hybrid or both? [PDF] [Copy] [Kimi¹] [REL]

Authors: Zoltán Tüske, Martin Sundermeyer, Ralf Schlüter, Hermann Ney

Gaussian Mixture Model (GMM) and Multi Layer Perceptron (MLP) based acoustic models are compared on a French large vocabulary continuous speech recognition (LVCSR) task. In addition to optimizing the output layer size of the MLP, the ef- fect of the deep neural network structure is also investigated. Moreover, using different linear transformations (time deriva- tives, LDA, CMLLR) on conventional MFCC, the study is also extended to MLP based probabilistic and bottle-neck TANDEM features. Results show that using either the hybrid or bottle- neck TANDEM approach leads to similar recognition perfor- mance. However, the best performance is achieved when deep MLP acoustic models are trained on concatenated cepstral and context-dependent bottle-neck features. Further experiments re- veal the importance of the neighbouring frames in case of MLP based modeling, and that its gain over GMM acoustic models is strongly reduced by more complex features.

Subject: INTERSPEECH.2012 - Speech Recognition

tuske12@interspeech_2012@ISCA

#1 Context-dependent MLPs for LVCSR: TANDEM, hybrid or both? [PDF] [Copy] [Kimi1] [REL]

#1 Context-dependent MLPs for LVCSR: TANDEM, hybrid or both? [PDF] [Copy] [Kimi¹] [REL]