meyer24b@interspeech_2024@ISCA

Total: 1

#1 RAST: A Reference-Audio Synchronization Tool for Dubbed Content [PDF] [Copy] [Kimi] [REL]

Authors: David Meyer ; Eitan Abecassis ; Clara Fernandez-Labrador ; Christopher Schroers

In the film industry, audio-video synchronization issues are considered major quality defects and key drivers of viewer disengagement. This is especially true for dubbed content, which is more prone to these errors due to the added manual process of replacing the original speech with a translated version. Despite their potential benefit for dubbed media production, automatic sync detection methods are seldom explored. In this paper, we propose a Transformer-based Siamese network for dubbed audio synchronization detection. Based on a large dataset of dubbed entertainment, we demonstrate that, compared to previous methods, our approach is more robust in detecting the misalignment introduced by translated speech segments. While our method addresses the previously studied constant synchronization errors, our model is the first to handle the frequent issue of intermittent offsets.