Processing math: 100%

2506.04118

Total: 1

#1 Guided Speculative Inference for Efficient Test-Time Alignment of LLMs [PDF] [Copy] [Kimi4] [REL]

Authors: Jonathan Geuter, Youssef Mroueh, David Alvarez-Melis

We propose Guided Speculative Inference (GSI), a novel algorithm for efficient reward-guided decoding in large language models. GSI combines soft best-of-n test-time scaling with a reward model r(x,y) and speculative samples from a small auxiliary model πS(yx). We provably approximate the optimal tilted policy πβ,B(yx)πB(yx)exp(βr(x,y)) of soft best-of-n under the primary model πB. We derive a theoretical bound on the KL divergence between our induced distribution and the optimal policy. In experiments on reasoning benchmarks (MATH500, OlympiadBench, Minerva Math), our method achieves higher accuracy than standard soft best-of-n with πS and reward-guided speculative decoding (Liao et al., 2025), and in certain settings even outperforms soft best-of-n with πB. The code is available at https://github.com/j-geuter/GSI .

Subjects: Machine Learning , Machine Learning

Publish: 2025-06-04 16:12:26 UTC