IAPO: Information-Aware Policy Optimization for Token-Efficient Reasoning

#1 IAPO: Information-Aware Policy Optimization for Token-Efficient Reasoning [PDF²] [Copy] [Kimi³] [REL]

Authors: Yinhan He, Yaochen Zhu, Mingjia Shi, Wendy Zheng, Lin Su, Xiaoqing Wang, Qi Guo, Jundong Li

Large language models increasingly rely on long chains of thought to improve accuracy, yet such gains come with substantial inference-time costs. We revisit token-efficient post-training and argue that existing sequence-level reward-shaping methods offer limited control over how reasoning effort is allocated across tokens. To bridge the gap, we propose IAPO, an information-theoretic post-training framework that assigns token-wise advantages based on each token's conditional mutual information (MI) with the final answer. This yields an explicit, principled mechanism for identifying informative reasoning steps and suppressing low-utility exploration. We provide a theoretical analysis showing that our IAPO can induce monotonic reductions in reasoning verbosity without harming correctness. Empirically, IAPO consistently improves reasoning accuracy while reducing reasoning length by up to 36%, outperforming existing token-efficient RL methods across various reasoning datasets. Extensive empirical evaluations demonstrate that information-aware advantage shaping is a powerful and general direction for token-efficient post-training. The code is available at https://github.com/YinhanHe123/IAPO.

Subjects: Computation and Language , Machine Learning

Publish: 2026-02-22 05:30:14 UTC

2602.19049

#1 IAPO: Information-Aware Policy Optimization for Token-Efficient Reasoning [PDF2] [Copy] [Kimi3] [REL]

#1 IAPO: Information-Aware Policy Optimization for Token-Efficient Reasoning [PDF²] [Copy] [Kimi³] [REL]