Generalized Per-Agent Advantage Estimation for Multi-Agent Policy Optimization

#1 Generalized Per-Agent Advantage Estimation for Multi-Agent Policy Optimization [PDF⁵] [Copy] [Kimi³] [REL]

Authors: Seongmin Kim, Giseung Park, Woojun Kim, Jiwon Jeon, Seungyeol Han, Youngchul Sung

In this paper, we propose a novel framework for multi-agent reinforcement learning that enhances sample efficiency and coordination through accurate per-agent advantage estimation. The core of our approach is Generalized Per-Agent Advantage Estimator (GPAE), which employs a per-agent value iteration operator to compute precise per-agent advantages. This operator enables stable off-policy learning by indirectly estimating values via action probabilities, eliminating the need for direct Q-function estimation. To further refine estimation, we introduce a double-truncated importance sampling ratio scheme. This scheme improves credit assignment for off-policy trajectories by balancing sensitivity to the agent's own policy changes with robustness to non-stationarity from other agents. Experiments on benchmarks demonstrate that our approach outperforms existing approaches, excelling in coordination and sample efficiency for complex scenarios.

Subject: Multiagent Systems

Publish: 2026-03-03 06:37:50 UTC

2603.02654

#1 Generalized Per-Agent Advantage Estimation for Multi-Agent Policy Optimization [PDF5] [Copy] [Kimi3] [REL]

#1 Generalized Per-Agent Advantage Estimation for Multi-Agent Policy Optimization [PDF⁵] [Copy] [Kimi³] [REL]