StepGuard: Guarding Web Navigation via Single-Step Calibration

#1 StepGuard: Guarding Web Navigation via Single-Step Calibration [PDF] [Copy] [Kimi¹] [REL]

Authors: Zhihao Cui, Yuchen Zhang, Xiyang Sun, Yaxiong Wang, Li Zhu, Jinpeng Hu, Liu Liu, Mengjia Li, Yujiao Wu

Web navigation requires agents to follow natural language goals, interact with web pages, and produce accurate answers. While recent advances leverage vision-language models and reinforcement learning, existing methods still suffer from single-step fragility due to reward misalignment and error propagation. To tackle the reward entanglement, we design Dynamic Dual-Policy Optimization (DDPO), which dynamically switches between a navigation-first mode for exploration and an answer-first mode for question-answering to mitigate reward conflict. To calibrate the single-step error, we propose Confidence-Guided Adaptive Navigation Reflection (CANR), a mechanism that estimates per-step confidence, triggers reflection only when necessary, and uses contrastive rewards to encourage self-correction to calibrate the single-step inaccuracy. With the above as the main components, we finally develop our StepGuard, a new framework of Guarding Web Navigation via Single-Step Calibration. Experiments demonstrate that our approach significantly improves navigation and answer accuracy, setting new state-of-the-art performance on standard web navigation benchmarks.

Subject: Artificial Intelligence

Publish: 2026-06-16 12:42:09 UTC

2606.17871

#1 StepGuard: Guarding Web Navigation via Single-Step Calibration [PDF] [Copy] [Kimi1] [REL]

#1 StepGuard: Guarding Web Navigation via Single-Step Calibration [PDF] [Copy] [Kimi¹] [REL]