Conditionally linear Gaussian models for estimating vocal tract resonances

#1 Conditionally linear Gaussian models for estimating vocal tract resonances [PDF] [Copy] [Kimi¹] [REL]

Authors: Daniel Rudoy, Daniel N. Spendley, Patrick J. Wolfe

Vocal tract resonances play a central role in the perception and analysis of speech. Here we consider the canonical task of estimating such resonances from an observed acoustic waveform, and formulate it as a statistical model-based tracking problem. In this vein, Deng and colleagues recently showed that a robust linearization of the formant-to-cepstrum map enables the effective use of a Kalman filtering framework. We extend this model both to account for the uncertainty of speech presence by way of a censored likelihood formulation, as well as to explicitly model formant cross-correlation via a vector autoregression, and in doing so retain a conditionally linear and Gaussian framework amenable to efficient estimation schemes. We provide evaluations using a recently introduced public database of formant trajectories, for which results indicate improvements from twenty to over 30% per formant in terms of root mean square error, relative to a contemporary benchmark formant analysis tool.

Subject: INTERSPEECH.2007 - Analysis and Assessment

rudoy07@interspeech_2007@ISCA

#1 Conditionally linear Gaussian models for estimating vocal tract resonances [PDF] [Copy] [Kimi1] [REL]

#1 Conditionally linear Gaussian models for estimating vocal tract resonances [PDF] [Copy] [Kimi¹] [REL]