shankar18@interspeech_2018@ISCA

Total: 1

#1 Spoken Keyword Detection Using Joint DTW-CNN [PDF] [Copy] [Kimi1]

Authors: Ravi Shankar ; C M Vikram ; S R M Prasanna

A method to detect spoken keywords in a given speech utterance is proposed, called as joint Dynamic Time Warping (DTW)- Convolution Neural Network (CNN). It is a combination of DTW approach with a strong classifier like CNN. Both these methods have independently shown significant results in solving problems related to optimal sequence alignment and object recognition, respectively. The proposed method modifies the original DTW formulation and converts the warping matrix into a gray scale image. A CNN is trained on these images to classify the presence or absence of keyword by identifying the texture of warping matrix. The TIMIT corpus has been used for conducting experiments and our method shows significant improvement over other existing techniques.