Moving Beyond Diversity: Visual Token Pruning as Subspace Reconstruction for Efficient VLMs

#1 Moving Beyond Diversity: Visual Token Pruning as Subspace Reconstruction for Efficient VLMs [PDF¹] [Copy] [Kimi] [REL]

Authors: Jaeyeon Lee, Shunjie Wen, Dong-Wan Choi

Despite their remarkable performance, Vision Language Models (VLMs) incur substantial computational overhead due to the large number of visual tokens. While diversity maximization has become a dominant strategy for token reduction, existing methods rely on cosine-based normalized similarity that discards magnitude information, failing to faithfully approximate the original feature representation and leading to suboptimal performance, particularly on compositional multi-skill reasoning tasks. In this paper, we introduce SPARE, a subspace reconstruction method that reformulates token pruning as a column subset selection problem and explicitly minimizes reconstruction error. By iteratively selecting tokens with large projection residuals, SPARE performs reconstruction-driven pruning beyond angular diversity. Moreover, we reveal a counterintuitive anti-relevance phenomenon: tokens with lower image-text relevance score can better preserve contextual information. Based on this finding, we incorporate anti-relevance into SPARE as an additional selection criterion to promote context-aware token selection. Extensive experiments across multiple VLMs and benchmarks demonstrate that SPARE consistently achieves state-of-the-art performance, with strong gains on compositional tasks. When applied to LLaVA, SPARE removes up to 94% of visual tokens while retaining 95% of the baseline performance, all in a fully training-free manner.

Subject: Computer Vision and Pattern Recognition

Publish: 2026-06-17 04:45:10 UTC

2606.18681

#1 Moving Beyond Diversity: Visual Token Pruning as Subspace Reconstruction for Efficient VLMs [PDF1] [Copy] [Kimi] [REL]

#1 Moving Beyond Diversity: Visual Token Pruning as Subspace Reconstruction for Efficient VLMs [PDF¹] [Copy] [Kimi] [REL]