2605.16384

Total: 1

#1 Mutual Enhancement Between Global Tokens and Patch Tokens: From Theory to Practice [PDF] [Copy] [Kimi1] [REL]

Authors: Xiusheng Huang, Xin Jiang, Jun Zhao, Kang Liu, Yequan Wang

Accurate and effective discrete image tokenization is crucial for long image sequence processing. However, current methods rigidly compress all content at a fixed rate, ignoring the variable information density of images and leading to either redundancy or information loss. Inspired by information entropy, we propose TaTok, a Theoretically grounded adaptive image Tokenization framework. We rigorously identify two key drawbacks in existing methods: information insufficiency when reconstructing images with patch tokens alone, and information redundancy among patch tokens. To address these, we introduce global tokens that model mutual information across patch tokens, and a Dynamic Token Filtering (DTF) algorithm based on cumulative conditional entropy to eliminate redundancy. Experiments confirm TaTok's state-of-the-art performance, delivering a 1.3x gFID improvement and 8.7x inference speedup. By allocating tokens according to information richness, TaTok enables more compressed yet accurate image tokenization, offering valuable insights for future research.

Subjects: Computer Vision and Pattern Recognition , Artificial Intelligence

Publish: 2026-05-11 10:51:02 UTC