QVGen: Pushing the Limit of Quantized Video Generative Models | Cool Papers

#1 QVGen: Pushing the Limit of Quantized Video Generative Models [PDF¹⁸] [Copy] [Kimi¹¹] [REL]

Authors: Yushi Huang, Ruihao Gong, Jing Liu, Yifu Ding, Chengtao Lv, Haotong Qin, Jun Zhang

Video diffusion models (DMs) have enabled high-quality video synthesis. Yet, their substantial computational and memory demands pose serious challenges to real-world deployment, even on high-end GPUs. As a commonly adopted solution, quantization has proven notable success in reducing cost for image DMs, while its direct application to video DMs remains ineffective. In this paper, we present QVGen, a novel quantization-aware training (QAT) framework tailored for high-performance and inference-efficient video DMs under extremely low-bit quantization (e.g., 4-bit or below). We begin with a theoretical analysis demonstrating that reducing the gradient norm is essential to facilitate convergence for QAT. To this end, we introduce auxiliary modules ( $\Phi$ ) to mitigate large quantization errors, leading to significantly enhanced convergence. To eliminate the inference overhead of $\Phi$ , we propose a rank-decay strategy that progressively eliminates $\Phi$ . Specifically, we repeatedly employ singular value decomposition (SVD) and a proposed rank-based regularization $\mathbf{\gamma}$ to identify and decay low-contributing components. This strategy retains performance while zeroing out inference overhead. Extensive experiments across $4$ state-of-the-art (SOTA) video DMs, with parameter sizes ranging from $1.3$ B $\sim14$ B, show that QVGen is the first to reach full-precision comparable quality under 4-bit settings. Moreover, it significantly outperforms existing methods. For instance, our 3-bit CogVideoX-2B achieves improvements of $+25.28$ in Dynamic Degree and $+8.43$ in Scene Consistency on VBench.

Subject: Computer Vision and Pattern Recognition

Publish: 2025-05-16 17:59:40 UTC

2505.11497

#1 QVGen: Pushing the Limit of Quantized Video Generative Models [PDF18] [Copy] [Kimi11] [REL]

#1 QVGen: Pushing the Limit of Quantized Video Generative Models [PDF¹⁸] [Copy] [Kimi¹¹] [REL]