Bilinear map of filter-bank outputs for DNN-based speech recognition

ogawa15@interspeech_2015@ISCA

Total: 1

#1 Bilinear map of filter-bank outputs for DNN-based speech recognition [PDF] [Copy] [Kimi¹] [REL]

Authors: Tetsuji Ogawa, Kenshiro Ueda, Kouichi Katsurada, Tetsunori Kobayashi, Tsuneo Nitta

Filter-bank outputs are extended into tensors to yield precise acoustic features for speech recognition using deep neural networks (DNNs). The filter-bank outputs with temporal contexts form a time-frequency pattern of speech and have been shown to be effective as a feature parameter for DNN-based acoustic models. We attempt to project the filter-bank outputs onto a tensor product space using decorrelation followed by a bilinear map to improve acoustic separability in feature extraction. This extension makes extracting a more precise structure of the time-frequency pattern possible because the bilinear map yields higher-order correlations of features. Experimental comparisons carried out in phoneme recognition demonstrate that the tensor feature provides comparable results to the filter-bank feature, and the fusion of the two features yields an improvement over each feature.

Subject: INTERSPEECH.2015 - Others

ogawa15@interspeech_2015@ISCA

#1 Bilinear map of filter-bank outputs for DNN-based speech recognition [PDF] [Copy] [Kimi1] [REL]

#1 Bilinear map of filter-bank outputs for DNN-based speech recognition [PDF] [Copy] [Kimi¹] [REL]