2501.17615

Total: 1

#1 Cross-lingual Embedding Clustering for Hierarchical Softmax in Low-Resource Multilingual Speech Recognition [PDF2] [Copy] [Kimi1] [REL]

Authors: Zhengdong Yang, Qianying Liu, Sheng Li, Fei Cheng, Chenhui Chu

We present a novel approach centered on the decoding stage of Automatic Speech Recognition (ASR) that enhances multilingual performance, especially for low-resource languages. It utilizes a cross-lingual embedding clustering method to construct a hierarchical Softmax (H-Softmax) decoder, which enables similar tokens across different languages to share similar decoder representations. It addresses the limitations of the previous Huffman-based H-Softmax method, which relied on shallow features in token similarity assessments. Through experiments on a downsampled dataset of 15 languages, we demonstrate the effectiveness of our approach in improving low-resource multilingual ASR accuracy.

Subjects: Computation and Language , Sound , Audio and Speech Processing

Publish: 2025-01-29 12:44:30 UTC