Total: 1
Chain-of-thought enables large reasoning models (LRMs) to reason through multi-step problems but often leads to unnecessarily long or redundant reasoning traces, a phenomenon known as overthinking. This results in inflated inference costs and potential degradation in answer quality. To address these challenges, we propose Confidence-Aware Reasoning (), an inference-time framework that optimizes reasoning trajectories by selectively pruning low-utility reasoning blocks and halting early when sufficient confidence has been achieved. is theoretically grounded in Bayesian optimal experimental design, treating each reasoning block as a sequential decision whose utility is approximated by its marginal contribution to reducing final answer uncertainty. We introduce a lightweight implementation that leverages token-level confidence to dynamically modulate reasoning depth without additional supervision. Evaluations on multiple benchmarks, including AMC, AIME, GPQA-Diamond, and MATH-500 show that improves answer accuracy by up to +13.3%, while reducing average reasoning length by 40%–50%. Our findings demonstrate that information-theoretic insights can effectively control self-guided reasoning and enable LRMs to “think just enough” at test time.