CqLWckpTbG@OpenReview

Total: 1

#1 DeepDiver: Adaptive Web-Search Intensity Scaling via Reinforcement Learning [PDF2] [Copy] [Kimi] [REL]

Authors: Wenxuan Shi, Haochen Tan, Chuqiao Kuang, Xiaoguang Li, Hanting Chen, Xiaozhe Ren, Yasheng Wang, Lu Hou, Lifeng Shang

Information seeking demands iterative evidence gathering and reflective reasoning, yet large language models (LLMs) still struggle with it in open-web question answering. Existing prompting and supervised fine-tuning (SFT) methods remain fixed by prompt rules or training corpora, and are usually benchmarked only on well-structured wiki sources, limiting real-world adaptability. We introduce $\textbf{WebPuzzle}$, a 24k-sample training and 275-sample test benchmark that evaluates information seeking on the live internet, across both wiki and open-domain queries. Leveraging 7k WebPuzzle instances, we develop $\textbf{DeepDiver}$, a reinforcement-learning (RL) framework that cultivates $\textbf{Search Intensity Scaling (SIS)}$—an emergent ability to escalate search frequency and depth instead of settling on overconfident, under-evidenced answers. With SIS, Qwen2.5-7B-Instruct and Pangu-7B-Reasoner attain performance on real-web tasks comparable to the 671B-parameter DeepSeek-R1. We detail DeepDiver’s curriculum from cold-start SFT to a well designed RL procedure, and show that its seeking policy generalized from closed-ended queries to open-ended generation such as long-form writing. Our results advance adaptive information seeking in LLMs and provide a rigorous benchmark for future work.

Subject: NeurIPS.2025 - Spotlight