ma17@interspeech_2017@ISCA

Total: 1

#1 Improving Deliverable Speech-to-Text Systems with Multilingual Knowledge Transfer [PDF] [Copy] [Kimi1]

Authors: Jeff Ma ; Francis Keith ; Tim Ng ; Man-Hung Siu ; Owen Kimball

This paper reports our recent progress on using multilingual data for improving speech-to-text (STT) systems that can be easily delivered. We continued the work BBN conducted on the use of multilingual data for improving Babel evaluation systems, but focused on training time-delay neural network (TDNN) based chain models. As done for the Babel evaluations, we used multilingual data in two ways: first, to train multilingual deep neural networks (DNN) for extracting bottle-neck (BN) features, and second, for initializing training on target languages. Our results show that TDNN chain models trained on multilingual DNN bottleneck features yield significant gains over their counterparts trained on MFCC plus i-vector features. By initializing from models trained on multilingual data, TDNN chain models can achieve great improvements over random initializations of the network weights on target languages. Two other important findings are: 1) initialization with multilingual TDNN chain models produces larger gains on target languages that have less training data; 2) inclusion of target languages in multilingual training for either BN feature extraction or initialization have limited impact on performance measured on the target languages. Our results also reveal that for TDNN chain models, the combination of multilingual BN features and multilingual initialization achieves the best performance on all target languages.