Generalizing continuous-space translation of paralinguistic information

#1 Generalizing continuous-space translation of paralinguistic information [PDF] [Copy] [Kimi¹] [REL]

Authors: Takatomo Kano, Shinnosuke Takamichi, Sakriani Sakti, Graham Neubig, Tomoki Toda, Satoshi Nakamura

In previous work, we proposed a model for speech-to-speech translation that is sensitive to paralinguistic information such as duration and power of spoken words. This model uses linear regression to map source acoustic features to target acoustic features directly and in continuous space. However, while the model is effective, it faces scalability issues as a single model must be trained for every word, which makes it difficult to generalize to words for which we do not have parallel speech. In this work we first demonstrate that simply training a linear regression model on all words is not sufficient to express paralinguistic translation. We next describe a neural network model that has sufficient expressive power to perform paralinguistic translation with a single model. We evaluate the proposed method on a digit translation task and show that we achieve similar results with a single neural network-based model as previous work did using word-dependent models.

Subject: INTERSPEECH.2013 - Speech Processing

kano13@interspeech_2013@ISCA

#1 Generalizing continuous-space translation of paralinguistic information [PDF] [Copy] [Kimi1] [REL]

#1 Generalizing continuous-space translation of paralinguistic information [PDF] [Copy] [Kimi¹] [REL]