2503.13377

Total: 1

#1 TimeZero: Temporal Video Grounding with Reasoning-Guided LVLM [PDF9] [Copy] [Kimi9] [REL]

Authors: Ye Wang, Boshen Xu, Zihao Yue, Zihan Xiao, Ziheng Wang, Liang Zhang, Dingyi Yang, Wenxuan Wang, Qin Jin

We introduce TimeZero, a reasoning-guided LVLM designed for the temporal video grounding (TVG) task. This task requires precisely localizing relevant video segments within long videos based on a given language query. TimeZero tackles this challenge by extending the inference process, enabling the model to reason about video-language relationships solely through reinforcement learning. To evaluate the effectiveness of TimeZero, we conduct experiments on two benchmarks, where TimeZero achieves state-of-the-art performance on Charades-STA. Code is available at https://github.com/www-Ye/TimeZero.

Subjects: Computer Vision and Pattern Recognition , Artificial Intelligence , Computation and Language

Publish: 2025-03-17 17:04:20 UTC