Deep Neural Network Based Acoustic-to-Articulatory Inversion Using Phone Sequence Information

#1 Deep Neural Network Based Acoustic-to-Articulatory Inversion Using Phone Sequence Information [PDF] [Copy] [Kimi¹] [REL]

Authors: Xurong Xie, Xunying Liu, Lan Wang

In recent years, neural network based acoustic-to-articulatory inversion approaches have achieved the state-of-the-art performance. One major issue associated with these approaches is the lack of phone sequence information during inversion. In order to address this issue, this paper proposes an improved architecture hierarchically concatenating phone classification and articulatory inversion component DNNs to improve articulatory movement generation. On a Mandarin Chinese speech inversion task, the proposed technique consistently outperformed a range of baseline DNN and RNN inversion systems constructed using no phone sequence information, a mixture density parameter output layer, additional phone features at the input layer, or multi-task learning with additional monophone output layer target labels, measured in terms of electromagnetic articulography (EMA) root mean square error (RMSE) and correlation. Further improvements were obtained using the bottleneck features extracted from the proposed hierarchical articulatory inversion systems as auxiliary features in generalized variable parameter HMMs (GVP-HMMs) based inversion systems.

Subject: INTERSPEECH.2016 - Speech Synthesis

xie16c@interspeech_2016@ISCA

#1 Deep Neural Network Based Acoustic-to-Articulatory Inversion Using Phone Sequence Information [PDF] [Copy] [Kimi1] [REL]

#1 Deep Neural Network Based Acoustic-to-Articulatory Inversion Using Phone Sequence Information [PDF] [Copy] [Kimi¹] [REL]