Wu_Importance-Based_Token_Merging_for_Efficient_Image_and_Video_Generation@ICCV2025@CVF

Total: 1

#1 Importance-Based Token Merging for Efficient Image and Video Generation [PDF17] [Copy] [Kimi7] [REL]

Authors: Haoyu Wu, Jingyi Xu, Hieu Le, Dimitris Samaras

Token merging can effectively accelerate various vision systems by processing groups of similar tokens only once and sharing the results across them. However, existing token grouping methods are often ad hoc and random, disregarding the actual content of the samples. We show that preserving high-information tokens during merging--those essential for semantic fidelity and structural details--significantly improves sample quality, producing finer details and more coherent, realistic generations. To do so, we propose an importance-based token merging method that prioritizes the most critical tokens in computational resource allocation, leveraging readily available importance scores, such as those from classifier-free guidance in diffusion models. Experiments show that our approach significantly outperforms baseline methods across multiple applications, including text-to-image synthesis, multi-view image generation, and video generation with various model architectures such as Stable Diffusion, Zero123++, AnimateDiff, or PixArt-\alpha.

Subject: ICCV.2025 - Oral