Test-Time Training for Robust Text-Guided Open-Vocabulary Object Counting

#1 Test-Time Training for Robust Text-Guided Open-Vocabulary Object Counting [PDF²] [Copy] [Kimi] [REL]

Authors: Hao-Yuan Ma, Yuda Zou, Li Zhang, Yongchao Xu

Text-guided Open-vocabulary Object Counting (TOOC) enables counting arbitrary object categories specified by text prompts, offering substantially greater flexibility than conventional closed-set counting. However, existing TOOC methods are developed and evaluated primarily on ideal images, while real-world scenes often suffer from adverse conditions such as rain, fog, darkness, and sensor noise, which severely degrade visual quality and impair vision-language alignment. To bridge this gap, we introduce Robust-TOOC, the first benchmark for evaluating TOOC under diverse corruption conditions, which covers six representative degradation types: rain, fog, darkness, Gaussian noise, salt-and-pepper noise, and mixed corruption. To improve robustness while preserving the original counting architecture, we propose Dual-TTT, a dual-architecture test-time training framework for TOOC. Specifically, during test-time training, Dual-TTT updates only the Text-guided Lightweight Denoising module (TL-Denoiser), while keeping the original counting network frozen. Inspired by diffusion models, the TL-Denoiser is optimized to remove corruption-aware noise from image representations under degraded conditions. Since only the TL-Denoiser is trained at test time, Dual-TTT is annotation-free and can be seamlessly integrated into existing TOOC models without modifying their original architecture. Extensive experiments on multiple recent TOOC baselines demonstrate the effectiveness of our method.

Subject: Computer Vision and Pattern Recognition

Publish: 2026-06-16 07:08:02 UTC

2606.17601

#1 Test-Time Training for Robust Text-Guided Open-Vocabulary Object Counting [PDF2] [Copy] [Kimi] [REL]

#1 Test-Time Training for Robust Text-Guided Open-Vocabulary Object Counting [PDF²] [Copy] [Kimi] [REL]