kim24o@interspeech_2024@ISCA

Total: 1

#1 Guided conditioning with predictive network on score-based diffusion model for speech enhancement [PDF1] [Copy] [Kimi] [REL]

Authors: Dail Kim ; Da-Hee Yang ; Donghyun Kim ; Joon-Hyuk Chang ; Jeonghwan Choi ; Moa Lee ; Jaemo Yang ; Han-gil Moon

Although diffusion-based speech enhancement (SE) models have emerged, they exhibit lower ability in noise removal than other predictive-based SE models. This reflects a trade-off between generative models, which are capable of producing more natural speech based on estimated target distribution, and predictive models, which are more effective in noise removal. To mitigate this trade-off, we propose a novel conditioning method for score-based diffusion models. The proposed approach involves guiding the diffusion model with a pretrained predictive model without joint training, thereby enabling enhanced speech to offer the proper direction to the diffusion model. The effectiveness of the proposed method is highlighted by outperforming the baseline method, with only half the number of sampling steps.