Voice conversion for non-parallel datasets using dynamic kernel partial least squares regression

#1 Voice conversion for non-parallel datasets using dynamic kernel partial least squares regression [PDF] [Copy] [Kimi¹] [REL]

Authors: Hanna Silén, Jani Nurminen, Elina Helander, Moncef Gabbouj

Voice conversion aims at converting speech from one speaker to sound as if it was spoken by another specific speaker. The most popular voice conversion approach based on Gaussian mixture modeling tends to suffer either from model overfitting or oversmoothing. To overcome the shortcomings of the traditional approach, we recently proposed to use dynamic kernel partial least squares (DKPLS) regression in the framework of parallel-data voice conversion. However, the availability of parallel training data from both the source and target speaker is not always guaranteed. In this paper, we extend the DKPLS-based conversion approach for non-parallel data by combining it with a well-known INCA alignment algorithm. The listening test results indicate that high-quality conversion can be achieved with the proposed combination. Furthermore, the performance of two variations of INCA are evaluated with both intra-lingual and cross-lingual data.

Subject: INTERSPEECH.2013 - Speech Synthesis

silen13@interspeech_2013@ISCA

#1 Voice conversion for non-parallel datasets using dynamic kernel partial least squares regression [PDF] [Copy] [Kimi1] [REL]

#1 Voice conversion for non-parallel datasets using dynamic kernel partial least squares regression [PDF] [Copy] [Kimi¹] [REL]