QuEST: Low-bit Diffusion Model Quantization via Efficient Selective Finetuning

#1 QuEST: Low-bit Diffusion Model Quantization via Efficient Selective Finetuning [PDF] [Copy] [Kimi] [REL]

Authors: Haoxuan Wang, Yuzhang Shang, Zhihang Yuan, Junyi Wu, Junchi Yan, Yan Yan

The practical deployment of diffusion models is still hindered by the high memory and computational overhead. Although quantization paves a way for model compression and acceleration, existing methods face challenges in achieving low-bit quantization efficiently. In this paper, we identify imbalanced activation distributions as a primary source of quantization difficulty, and propose to adjust these distributions through weight finetuning to be more quantization-friendly. We provide both theoretical and empirical evidence supporting finetuning as a practical and reliable solution. Building on this approach, we further distinguish two critical types of quantized layers: those responsible for retaining essential temporal information and those particularly sensitive to bit-width reduction. By selectively finetuning these layers under both local and global supervision, we mitigate performance degradation while enhancing quantization efficiency. Our method demonstrates its efficacy across three high-resolution image generation tasks, obtaining state-of-the-art performance across multiple bit-width settings.

Subject: ICCV.2025 - Poster

Wang_QuEST_Low-bit_Diffusion_Model_Quantization_via_Efficient_Selective_Finetuning@ICCV2025@CVF

#1 QuEST: Low-bit Diffusion Model Quantization via Efficient Selective Finetuning [PDF] [Copy] [Kimi] [REL]