siohan13@interspeech_2013@ISCA

Total: 1

#1 ivector-based acoustic data selection [PDF] [Copy] [Kimi1]

Authors: Olivier Siohan ; Michiel Bacchiani

This paper presents a data selection approach where spoken utterances are selected in a sequential fashion from a large out-of-domain data set to match the utterance distribution of an in-domain data set. We propose to represent each utterance by its iVector, a low dimensional vector indicating the coordinate of that utterance in a subspace acoustic model. We show that the distribution of iVectors can characterize a data set and enables distinguishing subsets of utterances from different domains. Last, we present experimental speech recognition results based on a system trained on a data set constructed by the proposed algorithm and a comparison with random data selection.