Repurposing a Speech Classifier for Guided Diffusion-Based Speech Generation

#1 Repurposing a Speech Classifier for Guided Diffusion-Based Speech Generation [PDF] [Copy] [Kimi] [REL]

Authors: Rostislav Makarov, Timo Gerkmann

Classifier guidance is a way to control diffusion generation by using a noise-conditioned classifier to steer the sampling process toward a target class. One drawback of classifier guidance is that it requires two separately trained models: a classifier and a diffusion model. We therefore study a more compact alternative in which a conventionally trained speech classifier is repurposed as the backbone for diffusion generation. Starting from a frozen noise-conditioned classifier in log-Mel space, we attach a lightweight subnetwork that reuses intermediate classifier representations and train only this subnetwork under a Denoising Score Matching objective. Our work shows that a pretrained classifier can be repurposed for conditional generation, providing an appealing bridge between discriminative modeling and conditional speech synthesis resulting in high speech quality within a single-backbone model, with reduced memory footprint and computational cost.

Subjects: Audio and Speech Processing , Artificial Intelligence , Machine Learning

Publish: 2026-06-18 16:40:02 UTC

2606.20457

#1 Repurposing a Speech Classifier for Guided Diffusion-Based Speech Generation [PDF] [Copy] [Kimi] [REL]