Not All Tokens Matter All The Time: Dynamic Token Aggregation Towards Efficient Detection Transformers

#1 Not All Tokens Matter All The Time: Dynamic Token Aggregation Towards Efficient Detection Transformers [PDF] [Copy] [Kimi] [REL]

Authors: Jiacheng Cheng, Xiwen Yao, Xiang Yuan, Junwei Han

The substantial computational demands of detection transformers (DETRs) hinder their deployment in resource-constrained scenarios, with the encoder consistently emerging as a critical bottleneck. A promising solution lies in reducing token redundancy within the encoder. However, existing methods perform static sparsification while ignoring the varying importance of tokens across different levels and encoder blocks for object detection, leading to suboptimal sparsification and performance degradation. In this paper, we propose **Dynamic DETR** (**Dynamic** token aggregation for **DE**tection **TR**ansformers), a novel strategy that leverages inherent importance distribution to control token density and performs multi-level token sparsification. Within each stage, we apply a proximal aggregation paradigm for low-level tokens to maintain spatial integrity, and a holistic strategy for high-level tokens to capture broader contextual information. Furthermore, we propose center-distance regularization to align the distribution of tokens throughout the sparsification process, thereby facilitating the representation consistency and effectively preserving critical object-specific patterns. Extensive experiments on canonical DETR models demonstrate that Dynamic DETR is broadly applicable across various models and consistently outperforms existing token sparsification methods.

Subject: ICML.2025 - Poster

Cr9qfD3qRc@OpenReview

#1 Not All Tokens Matter All The Time: Dynamic Token Aggregation Towards Efficient Detection Transformers [PDF] [Copy] [Kimi] [REL]