pradhan17@interspeech_2017@ISCA

Total: 1

#1 Excitation Source Features for Improving the Detection of Vowel Onset and Offset Points in a Speech Sequence [PDF] [Copy] [Kimi1]

Authors: Gayadhar Pradhan ; Avinash Kumar ; S. Shahnawazuddin

The task of detecting the vowel regions in a given speech signal is a challenging problem. Over the years, several works on accurate detection of vowel regions and the corresponding vowel onset points (VOPs) and vowel end points (VEPs) have been reported. A novel front-end feature extraction technique exploiting the temporal and spectral characteristics of the excitation source information in the speech signal is proposed in this paper to improve the detection of vowel regions, VOPs and VEPs. To do the same, a three-class classifiers (vowel, non-vowel and silence) is developed on the TIMIT database using the proposed features as well as mel-frequency cepstral coefficients (MFCC). Statistical modeling based on deep neural network has been employed for learning the parameters. Using the developed three-class classifier, a given speech sample is then forced aligned against the trained acoustic models to detect the vowel regions. The use of proposed feature results in detection of vowel regions quite different from those obtained through the MFCC. Exploiting the differences in the evidences obtained by using the two kinds of features, a technique to combine the evidences is also proposed in order to get a better estimate of the VOPs and VEPs.