shao04@interspeech_2004@ISCA

Total: 1

#1 MAP prediction of pitch from MFCC vectors for speech reconstruction [PDF] [Copy] [Kimi]

Authors: Xu Shao ; Ben P. Milner

This work proposes a method of predicting pitch and voicing from mel-frequency cepstral coefficient (MFCC) vectors. Two maximum a posteriori (MAP) methods are considered. The first models the joint distribution of the MFCC vector and pitch using a Gaussian mixture model (GMM) while the second method also models the temporal correlation of the pitch contour using a combined hidden Markov model (HMM)-GMM framework. Monophone-based HMMs are connected together in the form of an unconstrained monophone grammar which enables pitch to be predicted from unconstrained speech input. Evaluation on 130,000 MFCC vectors reveals a voicing classification accuracy of over 92% and an RMS pitch error of 10Hz. The predicted pitch contour is also applied to MFCC-based speech reconstruction with the resultant speech almost indistinguishable from that reconstructed using a reference pitch.