2503.23383

Total: 1

#1 ToRL: Scaling Tool-Integrated RL [PDF²⁰] [Copy] [Kimi²⁵] [REL]

Authors: Xuefeng Li, Haoyang Zou, Pengfei Liu

We introduce ToRL (Tool-Integrated Reinforcement Learning), a framework for training large language models (LLMs) to autonomously use computational tools via reinforcement learning. Unlike supervised fine-tuning, ToRL allows models to explore and discover optimal strategies for tool use. Experiments with Qwen2.5-Math models show significant improvements: ToRL-7B reaches 43.3\% accuracy on AIME~24, surpassing reinforcement learning without tool integration by 14\% and the best existing Tool-Integrated Reasoning (TIR) model by 17\%. Further analysis reveals emergent behaviors such as strategic tool invocation, self-regulation of ineffective code, and dynamic adaptation between computational and analytical reasoning, all arising purely through reward-driven learning.

Subject: Computation and Language

Publish: 2025-03-30 10:16:25 UTC

2503.23383

#1 ToRL: Scaling Tool-Integrated RL [PDF20] [Copy] [Kimi25] [REL]

#1 ToRL: Scaling Tool-Integrated RL [PDF²⁰] [Copy] [Kimi²⁵] [REL]