2501.15613

Total: 1

#1 Stepback: Enhanced Disentanglement for Voice Conversion via Multi-Task Learning [PDF2] [Copy] [Kimi2] [REL]

Authors: Qian Yang, Calbert Graham

Voice conversion (VC) modifies voice characteristics while preserving linguistic content. This paper presents the Stepback network, a novel model for converting speaker identity using non-parallel data. Unlike traditional VC methods that rely on parallel data, our approach leverages deep learning techniques to enhance disentanglement completion and linguistic content preservation. The Stepback network incorporates a dual flow of different domain data inputs and uses constraints with self-destructive amendments to optimize the content encoder. Extensive experiments show that our model significantly improves VC performance, reducing training costs while achieving high-quality voice conversion. The Stepback network's design offers a promising solution for advanced voice conversion tasks.

Subjects: Sound , Computation and Language , Audio and Speech Processing

Publish: 2025-01-26 17:43:32 UTC