FreeSonic: Training-Free Temporal-Aware Decoupled Attention for Precise Audio Editing

#1 FreeSonic: Training-Free Temporal-Aware Decoupled Attention for Precise Audio Editing [PDF] [Copy] [Kimi] [REL]

Authors: Yuxuan Jiang, Mingyang Han, Yusheng Dai, Andong Wang, Tianhong Zhou, Jiaxin Ye, Dongxiao Wang, Haoxiang Shi, Boyu Li, Jun Song, Cheng Yu, Bo Zheng, Weibei Dou, Zehua Chen, Jun Zhu

Text-to-audio (TTA) generation has made significant strides, yet achieving precise and consistent audio editing remains a major challenge. However, existing methods struggle to balance temporal consistency with background preservation. In this paper, we propose FreeSonic, a training-free framework leveraging the state-of-the-art Rectified Flow-based TangoFlux model. FreeSonic utilizes an optimized inversion-reverse process and joint text-audio attention maps for precise target segment extraction. For content editing, a novel scheduled attention decoupling confines modifications to target regions while preserving original acoustic context. Furthermore, task-oriented noise injection enhances versatility for tasks such as audio removal and non-rigid replacement. Extensive experimental results demonstrate that FreeSonic achieves a superior balance by providing a high-fidelity and efficient solution for precise and consistent audio editing. Project and demos: https://free-sonic.github.io/

Subjects: Sound , Artificial Intelligence , Audio and Speech Processing

Publish: 2026-06-13 08:22:20 UTC

2606.15186

#1 FreeSonic: Training-Free Temporal-Aware Decoupled Attention for Precise Audio Editing [PDF] [Copy] [Kimi] [REL]