Backchannel prediction for natural spoken dialog systems using general speaker and listener information

fukunaga25@interspeech_2025@ISCA

Total: 1

#1 Backchannel prediction for natural spoken dialog systems using general speaker and listener information [PDF¹] [Copy] [Kimi] [REL]

Authors: Yoshinori Fukunaga, Ryota Nishimura, Kengo Ohta, Norihide Kitaoka

Backchannel responses are a crucial component of conversations enabling more effective communication through listener feedback. Current backchannel prediction models classify these responses into just three categories, using speech, text, and listener IDs. These IDs, which contain detailed personal information, cannot be applied in real-world dialog systems however, and three-category classification limits response generation capabilities. Therefore, we propose a model for predicting a backchannel's 'surface form' using only general speaker and listener embeddings. Our experiments show a 1.3% improvement in prediction accuracy when performing 3-category classification, and a 0.9% improvement when performing 11-category classification, compared to conventional ID embeddings, demonstrating an enhancement in performance that is deployable in real-world systems.

Subject: INTERSPEECH.2025 - Others

fukunaga25@interspeech_2025@ISCA

#1 Backchannel prediction for natural spoken dialog systems using general speaker and listener information [PDF1] [Copy] [Kimi] [REL]

#1 Backchannel prediction for natural spoken dialog systems using general speaker and listener information [PDF¹] [Copy] [Kimi] [REL]