2025.acl-long.1165@ACL

Total: 1

#1 Enhancing Machine Translation with Self-Supervised Preference Data [PDF] [Copy] [Kimi1] [REL]

Authors: Haoxiang Sun, Ruize Gao, Pei Zhang, Baosong Yang, Rui Wang

Model alignment methods like Direct Preference Optimization and Contrastive Preference Optimization have enhanced machine translation performance by leveraging preference data to enable models to reject suboptimal outputs. During preference data construction, previous approaches primarily rely on humans, strong models like GPT4 or model self-sampling. In this study, we first explain the shortcomings of this practice. Then, we propose Self-Supervised Preference Optimization (SSPO), a novel framework which efficiently constructs translation preference data for iterative DPO training. Applying SSPO to 14B parameters large language models (LLMs) achieves comparable or better performance than GPT-4o on FLORES and multi-domain test datasets. We release an augmented MQM dataset in https://github.com/sunny-sjtu/MQM-aug.

Subject: ACL.2025 - Long Papers