A Recipe for Stable Offline Multi-agent Reinforcement Learning

#1 A Recipe for Stable Offline Multi-agent Reinforcement Learning [PDF²] [Copy] [Kimi] [REL]

Authors: Dongsu Lee, Daehee Lee, Amy Zhang

Despite remarkable achievements in single-agent offline reinforcement learning (RL), multi-agent RL (MARL) has struggled to adopt this paradigm, largely persisting with on-policy training and self-play from scratch. One reason for this gap comes from the instability of non-linear value decomposition, leading prior works to avoid complex mixing networks in favor of linear value decomposition (e.g., VDN) with value regularization used in single-agent setups. In this work, we analyze the source of instability in non-linear value decomposition within the offline MARL setting. Our observations confirm that they induce value-scale amplification and unstable optimization. To alleviate this, we propose a simple technique, scale-invariant value normalization (SVN), that stabilizes actor-critic training without altering the Bellman fixed point. Empirically, we examine the interaction among key components of offline MARL (e.g., value decomposition, value learning, and policy extraction) and derive a practical recipe that unlocks its full potential.

Subjects: Machine Learning , Artificial Intelligence , Robotics

Publish: 2026-03-09 13:57:08 UTC

2603.08399

#1 A Recipe for Stable Offline Multi-agent Reinforcement Learning [PDF2] [Copy] [Kimi] [REL]

#1 A Recipe for Stable Offline Multi-agent Reinforcement Learning [PDF²] [Copy] [Kimi] [REL]