kurata16@interspeech_2016@ISCA

Total: 1

#1 Improved Neural Network Initialization by Grouping Context-Dependent Targets for Acoustic Modeling [PDF] [Copy] [Kimi1]

Authors: Gakuto Kurata ; Brian Kingsbury

Neural Network (NN) Acoustic Models (AMs) are usually trained using context-dependent Hidden Markov Model (CD-HMM) states as independent targets. For example, the CD-HMM states of A-b-2 (second variant of beginning state of A) and A-m-1 (first variant of middle state of A) both correspond to the phone A, and A-b-1 and A-b-2 both correspond to the Context-independent HMM (CI-HMM) state A-b, but this relationship is not explicitly modeled. We propose a method that treats some neurons in the final hidden layer just below the output layer as dedicated neurons for phones or CI-HMM states by initializing connections between the dedicated neurons and the corresponding CD-HMM outputs with stronger weights than to other outputs. We obtained 6.5% and 3.6% relative error reductions with a DNN AM and a CNN AM, respectively, on a 50-hour English broadcast news task and 4.6% reduction with a CNN AM on a 500-hour Japanese task, in all cases after Hessian-free sequence training. Our proposed method only changes the NN parameter initialization and requires no additional computation in NN training or speech recognition run-time.