Optical Flow Guided Tongue Trajectory Generation for Diffusion-based Acoustic to Articulatory Inversion

#1 Optical Flow Guided Tongue Trajectory Generation for Diffusion-based Acoustic to Articulatory Inversion [PDF] [Copy] [Kimi] [REL]

Authors: Yudong Yang, Rongfeng Su, Rukiye Ruzi, Manwa Ng, Shaofeng Zhao, Nan Yan, Lan Wang

The diffusion-based Acoustic-to-Articulatory Inversion (AAI) approach has been shown impressive results for converting audio into Ultrasound Tongue Imaging (UTI) data with clear tongue contours. However, Mean Square Error (MSE) based diffusion models focus on the pixel error between reference and generated UTI data, inherently omitting changes in tongue movements. This leads to the discrepancy in tongue trajectory between reference and generated UTI data. To address this issue, this paper presents an Optical Flow Guided tongue trajectory generation method for training the diffusionbased AAI model. The optical flow method calculates the displacement information of the tongue contours in consecutive frames, enabling the tongue trajectory similarity between reference and generated UTI data to be used as an additional constraint for Diffusion Model network optimization. Experimental results show that our proposed diffusionbased AAI system with additional tongue trajectory constraint outperformed the baseline system across various evaluation metrics.

Subject: INTERSPEECH.2024 - Others

yang24o@interspeech_2024@ISCA

#1 Optical Flow Guided Tongue Trajectory Generation for Diffusion-based Acoustic to Articulatory Inversion [PDF] [Copy] [Kimi] [REL]