2025.acl-short.64@ACL

Total: 1

#1 Diffusion Directed Acyclic Transformer for Non-Autoregressive Machine Translation [PDF] [Copy] [Kimi] [REL]

Authors: Quan Nguyen-Tri, Cong Dao Tran, Hoang Thanh-Tung

Non-autoregressive transformers (NATs) predict entire sequences in parallel to reduce decoding latency, but they often encounter performance challenges due to the multi-modality problem. A recent advancement, the Directed Acyclic Transformer (DAT), addresses this issue by capturing multiple translation modalities to paths in a Directed Acyclic Graph (DAG). However, the collaboration with the latent variable introduced through the Glancing training (GLAT) is crucial for DAT to attain state-of-the-art performance. In this paper, we introduce Diffusion Directed Acyclic Transformer (Diff-DAT), which serves as an alternative to GLAT as a latent variable introduction for DAT. Diff-DAT offers two significant benefits over the previous approach. Firstly, it establishes a stronger alignment between training and inference. Secondly, it facilitates a more flexible tradeoff between quality and latency.

Subject: ACL.2025 - Short Papers