SCott: Accelerating Diffusion Models with Stochastic Consistency Distillation

#1 SCott: Accelerating Diffusion Models with Stochastic Consistency Distillation [PDF] [Copy] [Kimi] [REL]

Authors: Hongjian Liu, Qingsong Xie, Tianxiang Ye, Zhijie Deng, Chen Chen, Shixiang Tang, Xueyang Fu, Haonan Lu, Zheng-Jun Zha

The iterative sampling procedure employed by diffusion models (DMs) often leads to significant latency. To address this, we propose Stochastic Consistency Distillation (SCott) to enable accelerated text-to-image generation, where high-quality generations can be achieved with just 2-4 sampling steps or even1 step, and further improvements can be obtained by additional cost, e.g., 4 steps. In contrast to vanilla consistency distillation (CD) which distills the ordinary differential equation solvers-based sampling process of a pre-trained teacher model into a student, SCott explores the possibility and validates the efficacy of integrating stochastic differential equation (SDE) solvers into CD to fully unleash the potential of the teacher. SCott is augmented with elaborate strategies to control the noise strength and sampling process of the SDE solver. An adversarial loss is further incorporated to strengthen the sample quality with rare sampling steps. Empirically, on the MSCOCO-2017 5K dataset with a Stable Diffusion-V1.5 teacher, SCott achieves an FID of 21.9, surpassing that of the 1-step InstaFlow (23.4) and the 4-step UFOGen (22.1). Moreover, SCott can yield more diverse samples than other consistency models for high-resolution image generation, with up to 16% improvement in a qualified metric.

Subject: AAAI.2025 - Computer Vision

32580@AAAI

#1 SCott: Accelerating Diffusion Models with Stochastic Consistency Distillation [PDF] [Copy] [Kimi] [REL]