2023.iwslt-1.9@ACL

Total: 1

#1 Length-Aware NMT and Adaptive Duration for Automatic Dubbing [PDF] [Copy] [Kimi1]

Authors: Zhiqiang Rao ; Hengchao Shang ; Jinlong Yang ; Daimeng Wei ; Zongyao Li ; Jiaxin Guo ; Shaojun Li ; Zhengzhe Yu ; Zhanglin Wu ; Yuhao Xie ; Bin Wei ; Jiawei Zheng ; Lizhi Lei ; Hao Yang

This paper presents the submission of Huawei Translation Services Center for the IWSLT 2023 dubbing task in the unconstrained setting. The proposed solution consists of a Transformer-based machine translation model and a phoneme duration predictor. The Transformer is deep and multiple target-to-source length-ratio class labels are used to control target lengths. The variation predictor in FastSpeech2 is utilized to predict phoneme durations. To optimize the isochrony in dubbing, re-ranking and scaling are performed. The source audio duration is used as a reference to re-rank the translations of different length-ratio labels, and the one with minimum time deviation is preferred. Additionally, the phoneme duration outputs are scaled within a defined threshold to narrow the duration gap with the source audio.