Making LLMs Better Many-to-Many Speech-to-Text Translators with Curriculum Learning

#1 Making LLMs Better Many-to-Many Speech-to-Text Translators with Curriculum Learning [PDF²] [Copy] [Kimi²] [REL]

Authors: Yexing Du, Youcheng Pan, Ziyang Ma, Bo Yang, Yifan Yang, Keqi Deng, Xie Chen, Yang Xiang, Ming Liu, Bing Qin

Multimodal Large Language Models (MLLMs) have achieved significant success in Speech-to-Text Translation (S2TT) tasks. While most existing research has focused on English-centric translation directions, the exploration of many-to-many translation is still limited by the scarcity of parallel data. To address this, we propose a three-stage curriculum learning strategy that leverages the machine translation capabilities of large language models and adapts them to S2TT tasks, enabling effective learning in low-resource settings. We trained MLLMs with varying parameter sizes (3B, 7B, and 32B) and evaluated the proposed strategy using the FLEURS and CoVoST-2 datasets. Experimental results show that the proposed strategy achieves state-of-the-art average performance in 15×14 language pairs, requiring fewer than 10 hours of speech data per language to achieve competitive results. The source code and models are released at https://github.com/yxduir/LLM-SRT.

Subject: ACL.2025 - Long Papers

2025.acl-long.610@ACL

#1 Making LLMs Better Many-to-Many Speech-to-Text Translators with Curriculum Learning [PDF2] [Copy] [Kimi2] [REL]

#1 Making LLMs Better Many-to-Many Speech-to-Text Translators with Curriculum Learning [PDF²] [Copy] [Kimi²] [REL]