2025.findings-emnlp.704@ACL

Total: 1

#1 BiMax: Bidirectional MaxSim Score for Document-Level Alignment [PDF] [Copy] [Kimi] [REL]

Authors: Xiaotian Wang, Takehito Utsuro, Masaaki Nagata

Document alignment is necessary for the hierarchical mining, which aligns documents across source and target languages within the same web domain. Several high-precision sentence embedding-based methods have been developed, such as TK-PERT and Optimal Transport (OT). However, given the massive scale of web mining data, both accuracy and speed must be considered.In this paper, we propose a cross-lingual Bidirectional Maxsim score (BiMax) for computing doc-to-doc similarity,to improve efficiency compared to the OT method.Consequently, on the WMT16 bilingual document alignment task,BiMax attains accuracy comparable to OT with an approximate 100-fold speed increase.Meanwhile, we also conduct a comprehensive analysis to investigate the performance of current state-of-the-art multilingual sentence embedding models.

Subject: EMNLP.2025 - Findings