EVOM: Agentic Meta-Evolution of Actor-Critic Architectures for Reinforcement Learning

#1 EVOM: Agentic Meta-Evolution of Actor-Critic Architectures for Reinforcement Learning [PDF¹] [Copy] [Kimi¹] [REL]

Authors: Boyun Zhang, Chao Wang, Kai Wu

In actor-critic reinforcement learning, network architectures are typically manually designed. Automating this design is challenging because each candidate must be trained before evaluation, and the design space is open-ended. To address these challenges, we introduce EVOM, an agentic meta-evolution framework for discovering high-performance actor-critic architectures. We frame architecture search as a bi-level optimization: an inner loop trains weights via the low-fidelity proximal policy optimization (PPO), while an outer loop drives meta-evolution by iteratively refining architecture programs. Crucially, this outer loop is powered by an LLM-based design agent that operates purely as an architecture designer, completely decoupled from policy execution and environment control. Experiments reveal that EVOM outperforms the manually designed baseline, an LLM-guided random search, and the state-of-the-art LLM-guided programmatic policy search method MLES, delivering superior performance on Ant-v4 and HalfCheetah-v4. Ablation studies validate that both the meta-evolution loop and the LLM Design Agent are indispensable for final performance.

Subjects: Machine Learning , Artificial Intelligence

Publish: 2026-06-24 19:13:32 UTC

2606.26327

#1 EVOM: Agentic Meta-Evolution of Actor-Critic Architectures for Reinforcement Learning [PDF1] [Copy] [Kimi1] [REL]

#1 EVOM: Agentic Meta-Evolution of Actor-Critic Architectures for Reinforcement Learning [PDF¹] [Copy] [Kimi¹] [REL]