Modular combination of deep neural networks for acoustic modeling

#1 Modular combination of deep neural networks for acoustic modeling [PDF] [Copy] [Kimi¹] [REL]

Authors: Jonas Gehring, Wonkyum Lee, Kevin Kilgour, Ian Lane, Yajie Miao, Alex Waibel

In this work, we propose a modular combination of two popular applications of neural networks to large-vocabulary continuous speech recognition. First, a deep neural network is trained to extract bottleneck features from frames of mel scale filterbank coefficients. In a similar way as is usually done for GMM/HMM systems, this network is then applied as a non-linear discriminative feature-space transformation for a hybrid setup where acoustic modeling is performed by a deep belief network. This effectively results in a very large network, where the layers of the bottleneck network are fixed and applied to successive windows of feature frames in a time-delay fashion. We show that bottleneck features improve the recognition performance of DBN/HMM hybrids, and that the modular combination enables the acoustic model to benefit from a larger temporal context. Our architecture is evaluated on a recently released and challenging Tagalog corpus containing conversational telephone speech.

Subject: INTERSPEECH.2013 - Speech Recognition

gehring13@interspeech_2013@ISCA

#1 Modular combination of deep neural networks for acoustic modeling [PDF] [Copy] [Kimi1] [REL]

#1 Modular combination of deep neural networks for acoustic modeling [PDF] [Copy] [Kimi¹] [REL]