mote25b@interspeech_2025@ISCA

Total: 1

#1 Vector Quantized Cross-lingual Unsupervised Domain Adaptation for Speech Emotion Recognition [PDF5] [Copy] [Kimi4] [REL]

Authors: Pravin Mote, Donita Robinson, Elizabeth Richerson, Carlos Busso

Building speech emotion recognition (SER) models for low-resource languages is challenging due to the scarcity of labeled speech data. This limitation mandates the development of cross-lingual unsupervised domain adaptation techniques to effectively utilize labeled data from resource-rich languages. Inspired by the TransVQA framework, we propose a method that leverages a shared quantized feature space to enable knowledge transfer between labeled and unlabeled data across languages. The approach utilizes a quantized codebook to capture shared features, while reducing the domain gap, and aligning class distributions, thereby improving classification accuracy. Additionally, an information loss (InfoLoss) mechanism mitigates critical information loss during quantization. InfoLoss achieves this goal by minimizing the loss within the simplex of posterior class label distributions. The proposed method demonstrates superior performance compared to state-of-the-art baseline approaches.

Subject: INTERSPEECH.2025 - Speech Recognition