Voice Conversion Based on Matrix Variate Gaussian Mixture Model Using Multiple Frame Features

#1 Voice Conversion Based on Matrix Variate Gaussian Mixture Model Using Multiple Frame Features [PDF] [Copy] [Kimi²] [REL]

Authors: Yi Yang, Hidetsugu Uchida, Daisuke Saito, Nobuaki Minematsu

This paper presents a novel voice conversion method based on matrix variate Gaussian mixture model (MV-GMM) using features of multiple frames. In voice conversion studies, approaches based on Gaussian mixture models (GMM) are still widely utilized because of their flexibility and easiness in handling. They treat the joint probability density function (PDF) of feature vectors from source and target speakers as that of joint vectors of the two vectors. Addition of dynamic features to the feature vectors in GMM-based approaches achieves certain performance improvements because the correlation between multiple frames is taken into account. Recently, a voice conversion framework based on MV-GMM, in which the joint PDF is modeled in a matrix variate space, has been proposed and it is able to precisely model both the characteristics of the feature spaces and the relation between the source and target speakers. In this paper, in order to additionally model the correlation between multiple frames in the framework more consistently, MV-GMM is constructed in a matrix variate space containing the features of neighboring frames. Experimental results show that an certain performance improvement in both objective and subjective evaluations is observed.

Subject: INTERSPEECH.2016 - Speech Synthesis

yang16b@interspeech_2016@ISCA

#1 Voice Conversion Based on Matrix Variate Gaussian Mixture Model Using Multiple Frame Features [PDF] [Copy] [Kimi2] [REL]

#1 Voice Conversion Based on Matrix Variate Gaussian Mixture Model Using Multiple Frame Features [PDF] [Copy] [Kimi²] [REL]