Think on your Feet: Adaptive Thinking via Reinforcement Learning for Social Agents

#1 Think on your Feet: Adaptive Thinking via Reinforcement Learning for Social Agents [PDF¹¹] [Copy] [Kimi¹⁴] [REL]

Authors: Minzheng Wang, Yongbin Li, Haobo Wang, Xinghua Zhang, Nan Xu, Bingli Wu, Fei Huang, Haiyang Yu, Wenji Mao

Effective social intelligence simulation requires language agents to dynamically adjust reasoning depth, a capability notably absent in current approaches. While existing methods either lack this kind of reasoning capability or enforce uniform long chain-of-thought reasoning across all scenarios, resulting in excessive token usage and inappropriate social simulation. In this paper, we propose $\textbf{A}$ daptive $\textbf{M}$ ode $\textbf{L}$ earning ( $\textbf{AML}$ ) that strategically selects from four thinking modes (intuitive reaction $\rightarrow$ deep contemplation) based on real-time context. Our framework's core innovation, the $\textbf{A}$ daptive $\textbf{M}$ ode $\textbf{P}$ olicy $\textbf{O}$ ptimization ( $\textbf{AMPO}$ ) algorithm, introduces three key advancements over existing methods: (1) Multi-granular thinking mode design, (2) Context-aware mode switching across social interaction, and (3) Token-efficient reasoning via depth-adaptive processing. Extensive experiments on social intelligence tasks confirm that AML achieves 15.6% higher task performance than state-of-the-art methods. Notably, our method outperforms GRPO by 7.0% with 32.8% shorter reasoning chains. These results demonstrate that context-sensitive thinking mode selection, as implemented in AMPO, enables more human-like adaptive reasoning than GRPO's fixed-depth approach

Subjects: Computation and Language , Artificial Intelligence , Machine Learning

Publish: 2025-05-04 15:39:58 UTC

2505.02156

#1 Think on your Feet: Adaptive Thinking via Reinforcement Learning for Social Agents [PDF11] [Copy] [Kimi14] [REL]

#1 Think on your Feet: Adaptive Thinking via Reinforcement Learning for Social Agents [PDF¹¹] [Copy] [Kimi¹⁴] [REL]