DTT-BSR+: A Generative-Regression Cascade for Music Source Restoration

#1 DTT-BSR+: A Generative-Regression Cascade for Music Source Restoration [PDF] [Copy] [Kimi] [REL]

Authors: Youran Ni, Shihong Tan, Yuzhu Wang, Gongping Huang

Music source restoration (MSR) requires jointly addressing source unmixing and the inversion of non-linear production effects. Current methods struggle to achieve accurate target signal reconstruction while maintaining semantic consistency. To address this limitation, we propose DTT-BSR+, a two-stage cascade MSR system that decouples distribution fitting from signal reconstruction into separate stages. A generative DTT-BSR separator in the first stage produces stems matching the prior of clean sources, and a modified Demucs network in the second stage enhances the first stage output using time-domain and multi-resolution spectral losses. DTT-BSR+ improves multi-mel signal-to-noise ratio (MMSNR) over the single-stage DTT-BSR across all stems, and surpasses the state-of-the-art X-LANCE MSR system on five stems. We also reveal through Fréchet Audio Distance (FAD) decomposition an implicit trade-off between signal reconstruction accuracy and semantic distribution fitting across stems.

Subjects: Audio and Speech Processing , Artificial Intelligence , Sound

Publish: 2026-06-23 04:22:20 UTC

2606.24127

#1 DTT-BSR+: A Generative-Regression Cascade for Music Source Restoration [PDF] [Copy] [Kimi] [REL]