Comparing time-frequency representations for directional derivative features

#1 Comparing time-frequency representations for directional derivative features [PDF] [Copy] [Kimi¹] [REL]

Authors: James Gibson, Maarten Van Segbroeck, Shrikanth S. Narayanan

We compare the performance of Directional Derivatives features for automatic speech recognition when extracted from different time-frequency representations. Specifically, we use the short-time Fourier transform, Mel-frequency, and Gammatone spectrograms as a base from which we extract spectro-temporal modulations. We then assess the noise robustness of each representation with varied number of frequency bins and dynamic range compression schemes for both word and phone recognition. We find that the choice of dynamic range compression approach has the most significant impact on recognition performance. Whereas, the performance differences between perceptually motivated filter-banks are minimal in the proposed framework. Furthermore, this work presents significant gains in speech recognition accuracy for low SNRs over MFCCs, GFCCs, and Directional Derivatives extracted from the log-Mel spectrogram.

Subject: INTERSPEECH.2014 - Speech Recognition

gibson14@interspeech_2014@ISCA

#1 Comparing time-frequency representations for directional derivative features [PDF] [Copy] [Kimi1] [REL]

#1 Comparing time-frequency representations for directional derivative features [PDF] [Copy] [Kimi¹] [REL]