Dynamic stream weight estimation in coupled-HMM-based audio-visual speech recognition using multilayer perceptrons

#1 Dynamic stream weight estimation in coupled-HMM-based audio-visual speech recognition using multilayer perceptrons [PDF] [Copy] [Kimi¹] [REL]

Authors: Ahmed Hussen Abdelaziz, Dorothea Kolossa

Jointly using audio and video features can increase the robustness of automatic speech recognition systems in noisy environments. A systematic and reliable performance gain, however, is only achieved if the contributions of the audio and video stream to the decoding decision are dynamically optimized, for example via so-called stream weights. In this paper, we address the problem of dynamic stream weight estimation for coupled-HMM-based audio-visual speech recognition. We investigate the multilayer perceptron (MLP) for mapping reliability measure features to stream weights. As an input for the multilayer perceptron, we use a feature vector containing different model-based and signal-based reliability measures. Training of the multilayer perceptron has been achieved using dynamic oracle stream weights as target outputs, which are found using a recently proposed expectation maximization algorithm. This new approach of MLP-based stream-weight estimation has been evaluated using the Grid audio-visual corpus and has outperformed the best baseline performance, yielding a 23.72% average relative error rate reduction.

Subject: INTERSPEECH.2014 - Speech Processing

abdelaziz14@interspeech_2014@ISCA

#1 Dynamic stream weight estimation in coupled-HMM-based audio-visual speech recognition using multilayer perceptrons [PDF] [Copy] [Kimi1] [REL]

#1 Dynamic stream weight estimation in coupled-HMM-based audio-visual speech recognition using multilayer perceptrons [PDF] [Copy] [Kimi¹] [REL]