SEARL: Joint Optimization of Policy and Tool Graph Memory for Self-Evolving Agents

#1 SEARL: Joint Optimization of Policy and Tool Graph Memory for Self-Evolving Agents [PDF¹] [Copy] [Kimi¹] [REL]

Authors: Xinshun Feng, Xinhao Song, Lijun Li, Gongshen Liu, Jing Shao

Recent advances in Reinforcement Learning with Verifiable Rewards (RLVR) have demonstrated significant potential in single-turn reasoning tasks. With the paradigm shift toward self-evolving agentic learning, models are increasingly expected to learn from trajectories by synthesizing tools or accumulating explicit experiences. However, prevailing methods typically rely on large-scale LLMs or multi-agent frameworks, which hinder their deployment in resource-constrained environments. The inherent sparsity of outcome-based rewards also poses a substantial challenge, as agents typically receive feedback only upon completion of tasks. To address these limitations, we introduce a Tool-Memory based self-evolving agentic framework SEARL. Unlike approaches that directly utilize interaction experiences, our method constructs a structured experience memory that integrates planning with execution. This provides a novel state abstraction that facilitates generalization across analogous contexts, such as tool reuse. Consequently, agents extract explicit knowledge from historical data while leveraging inter-trajectory correlations to densify reward signals. We evaluate our framework on knowledge reasoning and mathematics tasks, demonstrating its effectiveness in achieving more practical and efficient learning.

Subjects: Artificial Intelligence , Machine Learning

Publish: 2026-04-09 04:38:47 UTC

2604.07791

#1 SEARL: Joint Optimization of Policy and Tool Graph Memory for Self-Evolving Agents [PDF1] [Copy] [Kimi1] [REL]

#1 SEARL: Joint Optimization of Policy and Tool Graph Memory for Self-Evolving Agents [PDF¹] [Copy] [Kimi¹] [REL]