Reasoning through Exploration: A Reinforcement Learning Framework for Robust Function Calling

#1 Reasoning through Exploration: A Reinforcement Learning Framework for Robust Function Calling [PDF¹²] [Copy] [Kimi²⁰] [REL]

Authors: Bingguang Hao, Zengzhuang Xu, Maolin Wang, Yuntao Wen, Yicheng Chen, Cunyin Peng, Long Chen, Dong Wang, Xiangyu Zhao, Jinjie Gu, Chenyi Zhuang, Ji Zhang

The effective training of Large Language Models (LLMs) for function calling faces a critical challenge: balancing exploration of complex reasoning paths with stable policy optimization. Standard methods like Supervised Fine-Tuning (SFT) fail to instill robust reasoning, and traditional Reinforcement Learning (RL) struggles with inefficient exploration. We propose \textbf{EGPO}, a new RL framework built upon Group Relative Policy Optimization (GRPO), designed to address this challenge directly. The core of EGPO is an entropy-enhanced advantage function that integrates the entropy of the model's Chain-of-Thought (CoT) into the policy gradient computation. This encourages the generation of diverse reasoning strategies. To maintain optimization direction, the entropy bonus is carefully constrained by a clipping mechanism. Complemented by a strict, binary reward signal, EGPO effectively guides the model towards discovering structured and accurate tool invocation patterns. On the challenging Berkeley Function Calling Leaderboard (BFCL), a 4B-parameter model trained with EGPO sets a new state-of-the-art among models of comparable size, surpassing a range of strong competitors, including GPT-4o and Gemini-2.5.

Subjects: Machine Learning , Artificial Intelligence , Computation and Language

Publish: 2025-08-07 07:51:38 UTC

2508.05118

#1 Reasoning through Exploration: A Reinforcement Learning Framework for Robust Function Calling [PDF12] [Copy] [Kimi20] [REL]

#1 Reasoning through Exploration: A Reinforcement Learning Framework for Robust Function Calling [PDF¹²] [Copy] [Kimi²⁰] [REL]