Total: 1
Backchannel responses are a crucial component of conversations enabling more effective communication through listener feedback. Current backchannel prediction models classify these responses into just three categories, using speech, text, and listener IDs. These IDs, which contain detailed personal information, cannot be applied in real-world dialog systems however, and three-category classification limits response generation capabilities. Therefore, we propose a model for predicting a backchannel's 'surface form' using only general speaker and listener embeddings. Our experiments show a 1.3% improvement in prediction accuracy when performing 3-category classification, and a 0.9% improvement when performing 11-category classification, compared to conventional ID embeddings, demonstrating an enhancement in performance that is deployable in real-world systems.