palaz16@interspeech_2016@ISCA

Total: 1

#1 Jointly Learning to Locate and Classify Words Using Convolutional Networks [PDF] [Copy] [Kimi1]

Authors: Dimitri Palaz ; Gabriel Synnaeve ; Ronan Collobert

In this paper, we propose a novel approach for weakly-supervised word recognition. Most state of the art automatic speech recognition systems are based on frame-level labels obtained through forced alignments or through a sequential loss. Recently, weakly-supervised trained models have been proposed in vision, that can learn which part of the input is relevant for classifying a given pattern [1]. Our system is composed of a convolutional neural network and a temporal score aggregation mechanism. For each sentence, it is trained using as supervision only some of the words (most frequent) that are present in a given sentence, without knowing their order nor quantity. We show that our proposed system is able to jointly classify and localize words. We also evaluate the system on a keyword spotting task, and show that it can yield similar performance to strong supervised HMM/GMM baseline.