Voice activity detection using speech recognizer feedback

#1 Voice activity detection using speech recognizer feedback [PDF] [Copy] [Kimi¹] [REL]

Authors: Kit Thambiratnam, Weiwu Zhu, Frank Seide

This paper demonstrates how feedback from a speech recognizer can be leveraged to improve Voice Activity Detection (VAD) for online speech recognition. First, reliably transcribed segments of audio are fed back by the recognizer as supervision for VAD model adaptation. This allows the much stronger LVCSR acoustic models to be harnessed without adding computation. Second, when to make a VAD decision is dictated by the recognizer not the VAD module, allowing an implicit dynamic look-ahead for VAD. This improves robustness but can be gracefully reduced to meet latency requirements if necessary without requiring retraining/retuning of the VAD module. Experiments on telephone conversations yielded a 6.7% abs. reduction in frame classification error rate when feedback was applied to HMM-based VAD and a 4.2% abs. reduction over the best baseline system. Furthermore, a 3.0% abs. WER reduction was achieved over the best baseline in speech recognition experiments.

Subject: INTERSPEECH.2012 - Speech Processing

thambiratnam12@interspeech_2012@ISCA

#1 Voice activity detection using speech recognizer feedback [PDF] [Copy] [Kimi1] [REL]

#1 Voice activity detection using speech recognizer feedback [PDF] [Copy] [Kimi¹] [REL]