hashimoto17@interspeech_2017@ISCA

Total: 1

#1 Parallel-Data-Free Many-to-Many Voice Conversion Based on DNN Integrated with Eigenspace Using a Non-Parallel Speech Corpus [PDF] [Copy] [Kimi1]

Authors: Tetsuya Hashimoto ; Hidetsugu Uchida ; Daisuke Saito ; Nobuaki Minematsu

This paper proposes a novel approach to parallel-data-free and many-to-many voice conversion (VC). As 1-to-1 conversion has less flexibility, researchers focus on many-to-many conversion, where speaker identity is often represented using speaker space bases. In this case, utterances of the same sentences have to be collected from many speakers. This study aims at overcoming this constraint to realize a parallel-data-free and many-to-many conversion. This is made possible by integrating deep neural networks (DNNs) with eigenspace using a non-parallel speech corpus. In our previous study, many-to-many conversion was implemented using DNN, whose training was assisted by EVGMM conversion. By realizing the function of EVGMM equivalently by constructing eigenspace with a non-parallel speech corpus, the desired conversion is made possible. A key technique here is to estimate covariance terms without given parallel data between source and target speakers. Experiments show that objective assessment scores are comparable to those of the baseline system trained with parallel data.