DOVTrack: Data-Efficient Open-Vocabulary Tracking

#1 DOVTrack: Data-Efficient Open-Vocabulary Tracking [PDF] [Copy] [Kimi] [REL]

Authors: Zekun Qian, Ruize Han, Zhixiang Wang, Junhui Hou, Wei Feng

Open-Vocabulary Multi-Object Tracking (OVMOT) aims to detect and track multi-category objects including both seen and unseen categories during training. Currently, a significant challenge in this domain is the lack of large-scale annotated video data for training. To address this challenge, this work aims to effectively train the OV tracker using only the existing limited and sparsely annotated video data. We propose a comprehensive training sample space expansion strategy that addresses the fundamental limitation of sparse annotations in OVMOT training. Specifically, for the association task, we develop a diffusion-based feature generation framework that synthesizes intermediate object features between sparsely annotated frames, effectively expanding the training sample space by approximately 3× and enabling robust association learning from temporally continuous features. For the detection task, we introduce a dynamic group contrastive learning approach that generates diverse sample groups through affinity, dispersion, and adversarial grouping strategies, tripling the effective training samples for classification while maintaining sample quality. Additionally, we propose an adaptive localization loss that expands positive sample coverage by lowering IoU thresholds while mitigating noise through confidence-based weighting. Extensive experiments demonstrate that our method achieves state-of-the-art performance on the OVMOT benchmark, surpassing existing methods by 3.8\% in TETA metric, without requiring additional data or annotations. The code will be available at https://github.com/zekunqian/DOVTrack.

Subject: NeurIPS.2025 - Poster

DIVHwy7wfh@OpenReview

#1 DOVTrack: Data-Efficient Open-Vocabulary Tracking [PDF] [Copy] [Kimi] [REL]