Physically Constrained Statistical F0 Prediction for Electrolaryngeal Speech Enhancement

#1 Physically Constrained Statistical F0 Prediction for Electrolaryngeal Speech Enhancement [PDF] [Copy] [Kimi¹] [REL]

Authors: Kou Tanaka, Hirokazu Kameoka, Tomoki Toda, Satoshi Nakamura

Electrolaryngeal (EL) speech produced by a laryngectomee using an electrolarynx to mechanically generate artificial excitation sounds severely suffers from unnatural fundamental frequency (F0) patterns caused by monotonic excitation sounds. To address this issue, we have previously proposed EL speech enhancement systems using statistical F0 pattern prediction methods based on a Gaussian Mixture Model (GMM), making it possible to predict the underlying F0 pattern of EL speech from its spectral feature sequence. Our previous work revealed that the naturalness of the predicted F0 pattern can be improved by incorporating a physically based generative model of F0 patterns into the GMM-based statistical F0 prediction system within a Product-of-Expert framework. However, one drawback of this method is that it requires an iterative procedure to obtain a predicted F0 pattern, making it difficult to realize a real-time system. In this paper, we propose yet another approach to physically based statistical F0 pattern prediction by using a HMM-GMM framework. This approach is noteworthy in that it allows to generate an F0 pattern that is both statistically likely and physically natural without iterative procedures. Experimental results demonstrated that the proposed method was capable of generating F0 patterns more similar to those in normal speech than the conventional GMM-based method.

Subject: INTERSPEECH.2017 - Speech Synthesis

tanaka17@interspeech_2017@ISCA

#1 Physically Constrained Statistical F0 Prediction for Electrolaryngeal Speech Enhancement [PDF] [Copy] [Kimi1] [REL]

#1 Physically Constrained Statistical F0 Prediction for Electrolaryngeal Speech Enhancement [PDF] [Copy] [Kimi¹] [REL]