Learning essential speaker sub-space using hetero-associative neural networks for speaker clustering

#1 Learning essential speaker sub-space using hetero-associative neural networks for speaker clustering [PDF] [Copy] [Kimi] [REL]

Authors: Shajith Ikbal, Karthik Visweswariah

In this paper, we present a novel approach to speaker clustering involving the use of hetero-associative neural network (HANN) to compute very low dimensional speaker discriminatory features (in our case 1-dimensional) in a data-driven manner. A HANN trained to map input feature space onto speaker labels through a bottle-neck hidden layer is expected to learn very low dimensional feature subspace essentially containing speaker information. The lower dimensional features are further used in a simple k-means clustering algorithm to obtain speaker segmentation. Evaluation of this approach on a database of real-life conversational speech from call-centers show that clustering performance achieved is similar to that of the state-of-the-art systems, although our approach uses just 1-dimensional features. Augmenting these features with the traditional mel-frequency cepstral coefficients (MFCC) features in the state-of-the-art system resulted in improved clustering performance.

Subject: INTERSPEECH.2008 - Others

ikbal08@interspeech_2008@ISCA

#1 Learning essential speaker sub-space using hetero-associative neural networks for speaker clustering [PDF] [Copy] [Kimi] [REL]