ViRFgwVjk0@OpenReview

Total: 1

#1 AdaptiveStep: Automatically Dividing Reasoning Step through Model Confidence [PDF] [Copy] [Kimi1] [REL]

Authors: Yuliang Liu, Junjie Lu, Chaofeng Qu, Zhaoling Chen, Zefan Cai, Jason Liu, Chonghan Liu, Yunhui Xia, Li Zhao, Jiang Bian, Chuheng Zhang, Wei Shen, Zhouhan Lin

Current approaches for training Process Reward Models (PRMs) often involve deconposing responses into multiple reasoning steps using rule-based techniques, such as using predefined placeholder tokens or setting the reasoning step's length to a fixed size.These approaches overlook the fact that certain words don't usually indicate true decision points. To address this, we propose AdaptiveStep, a method that divides reasoning steps based on the model's confidence in predicting the next word, offering more information on decision-making at each step, improving downstream tasks like reward model training. Moreover, our method requires no manual annotation. Experiments with AdaptiveStep-trained PRMs in mathematical reasoning and code generation show that the outcome PRM achieves state-of-the-art Best-of-N performance, surpassing greedy search strategy with token-level value-guided decoding, while also reducing construction costs by over 30% compared to existing open-source PRMs. We also provide a thorough analysis and case study on its performance, transferability, and generalization capabilities. We provide our code on https://github.com/Lux0926/ASPRM.

Subject: ICML.2025 - Poster