Ensemble-MIX: Enhancing Sample Efficiency in Multi-Agent RL Using Ensemble Methods

#1 Ensemble-MIX: Enhancing Sample Efficiency in Multi-Agent RL Using Ensemble Methods [PDF¹] [Copy] [Kimi²] [REL]

Multi-agent reinforcement learning (MARL) methods have achieved state-of-the-art results on a range of multi-agent tasks. Yet, MARL algorithms typically require significantly more environment interactions than their single-agent counterparts to converge, a problem exacerbated by the difficulty in exploring over a large joint action space and the high variance intrinsic to MARL environments. To tackle these issues, we propose a novel algorithm that combines a decomposed centralized critic with decentralized ensemble learning, incorporating several key contributions. The main component in our scheme is a selective exploration method that leverages ensemble kurtosis. We extend the global decomposed critic with a diversity-regularized ensemble of individual critics and utilize its excess kurtosis to guide exploration toward high-uncertainty states and actions. To improve sample efficiency, we train the centralized critic with a novel truncated variation of the TD( $\lambda$ ) algorithm, enabling efficient off-policy learning with reduced variance. On the actor side, our suggested algorithm adapts the mixed samples approach to MARL, mixing on-policy and off-policy loss functions for training the actors. This approach balances between stability and efficiency and outperforms purely off-policy learning. The evaluation shows our method outperforms state-of-the-art baselines on standard MARL benchmarks, including a variety of SMAC II maps.

Subjects: Systems and Control , Machine Learning

Publish: 2025-06-03 13:13:15 UTC

2506.02841

#1 Ensemble-MIX: Enhancing Sample Efficiency in Multi-Agent RL Using Ensemble Methods [PDF1] [Copy] [Kimi2] [REL]

#1 Ensemble-MIX: Enhancing Sample Efficiency in Multi-Agent RL Using Ensemble Methods [PDF¹] [Copy] [Kimi²] [REL]