Budget-aware Test-time Scaling via Discriminative Verification

#1 Budget-aware Test-time Scaling via Discriminative Verification [PDF³] [Copy] [Kimi³] [REL]

Authors: Kyle Montgomery, Sijun Tan, Yuqi Chen, Siyuan Zhuang, Tianjun Zhang, Raluca Ada Popa, Chenguang Wang

Test-time scaling is a powerful strategy for boosting the performance of large language models on complex reasoning tasks. While state-of-the-art approaches often employ generative verifiers to select the best solution from a pool of candidates, this method incurs prohibitive computational costs, limiting its practicality. In this work, we shift the focus to a more budget-aware paradigm: discriminative verification. We conduct a thorough empirical analysis and demonstrate that while discriminative verifiers may underperform in isolation, combining them with self-consistency in a hybrid approach creates a powerful and efficient test-time scaling mechanism. Notably, under a fixed compute budget, this hybrid approach surpasses state-of-the-art generative verification by a significant margin: achieving up to 15.3\% higher accuracy on AIME2025. Our findings establish that for practical, real-world applications, budget-aware scaling with discriminative verifiers is not only a "free" upgrade over self-consistency, but also a more effective and efficient alternative to costly generative techniques. Code is available at https://github.com/wang-research-lab/verification.

Subjects: Artificial Intelligence , Computation and Language , Machine Learning

Publish: 2025-10-16 17:30:02 UTC

2510.14913

#1 Budget-aware Test-time Scaling via Discriminative Verification [PDF3] [Copy] [Kimi3] [REL]

#1 Budget-aware Test-time Scaling via Discriminative Verification [PDF³] [Copy] [Kimi³] [REL]