2024.iwslt-1.2@ACL

Total: 1

#1 Pause-Aware Automatic Dubbing using LLM and Voice Cloning [PDF1] [Copy] [Kimi] [REL]

Authors: Yuang Li ; Jiaxin Guo ; Min Zhang ; Ma Miaomiao ; Zhiqiang Rao ; Weidong Zhang ; Xianghui He ; Daimeng Wei ; Hao Yang

Automatic dubbing aims to translate the speech of a video into another language, ensuring the new speech naturally fits the original video. This paper details Huawei Translation Services Center’s (HW-TSC) submission for IWSLT 2024’s automatic dubbing task, under an unconstrained setting. Our system’s machine translation (MT) component utilizes a Transformer-based MT model and an LLM-based post-editor to produce translations of varying lengths. The text-to-speech (TTS) component employs a VITS-based TTS model and a voice cloning module to emulate the original speaker’s vocal timbre. For enhanced dubbing synchrony, we introduce a parsing-informed pause selector. Finally, we rerank multiple results based on lip-sync error distance (LSE-D) and character error rate (CER). Our system achieves LSE-D of 10.75 and 12.19 on subset1 and subset2 of DE-EN test sets respectively, superior to last year’s best system.