Contextual Acoustic Barge-In Classification for Spoken Dialog Systems

#1 Contextual Acoustic Barge-In Classification for Spoken Dialog Systems [PDF] [Copy] [Kimi¹] [REL]

Authors: Dhanush Bekal, Sundararajan Srinivasan, Srikanth Ronanki, Sravan Bodapati, Katrin Kirchhoff

In this work, we define barge-in verification as a supervised learning task where audio-only information is used to classify user spoken dialogue into true and false barge-ins. Following the success of pre-trained models, we use low-level speech representations from a self-supervised representation learning model for our downstream classification task. Further, we propose a novel technique to infuse lexical information directly into speech representations to improve the domain-specific language information implicitly learned during pre-training. Experiments conducted on spoken dialog data show that our proposed model trained to validate barge-in entirely from speech representations is faster by 38% relative and achieves 4.5% relative F1 score improvement over a baseline LSTM model that uses both audio and Automatic Speech Recognition (ASR) 1-best hypotheses. On top of this, our best proposed model with lexically infused representations along with contextual features provides a fur- ther relative improvement of 5.7% in the F1 score but only 22% faster than the baseline.

Subject: INTERSPEECH.2022 - Language and Multimodal

bekal22@interspeech_2022@ISCA

#1 Contextual Acoustic Barge-In Classification for Spoken Dialog Systems [PDF] [Copy] [Kimi1] [REL]

#1 Contextual Acoustic Barge-In Classification for Spoken Dialog Systems [PDF] [Copy] [Kimi¹] [REL]