Pause-Aware Automatic Dubbing using LLM and Voice Cloning | Cool Papers

#1 Pause-Aware Automatic Dubbing using LLM and Voice Cloning [PDF²] [Copy] [Kimi³] [REL]

Authors: Yuang Li, Jiaxin Guo, Min Zhang, Ma Miaomiao, Zhiqiang Rao, Weidong Zhang, Xianghui He, Daimeng Wei, Hao Yang

Automatic dubbing aims to translate the speech of a video into another language, ensuring the new speech naturally fits the original video. This paper details Huawei Translation Services Center’s (HW-TSC) submission for IWSLT 2024’s automatic dubbing task, under an unconstrained setting. Our system’s machine translation (MT) component utilizes a Transformer-based MT model and an LLM-based post-editor to produce translations of varying lengths. The text-to-speech (TTS) component employs a VITS-based TTS model and a voice cloning module to emulate the original speaker’s vocal timbre. For enhanced dubbing synchrony, we introduce a parsing-informed pause selector. Finally, we rerank multiple results based on lip-sync error distance (LSE-D) and character error rate (CER). Our system achieves LSE-D of 10.75 and 12.19 on subset1 and subset2 of DE-EN test sets respectively, superior to last year’s best system.

Subject: IWSLT.2024

2024.iwslt-1.2@ACL

#1 Pause-Aware Automatic Dubbing using LLM and Voice Cloning [PDF2] [Copy] [Kimi3] [REL]

#1 Pause-Aware Automatic Dubbing using LLM and Voice Cloning [PDF²] [Copy] [Kimi³] [REL]