Exploration-Driven Reinforcement Learning for Expert Routing Improvement in Mixture-of-Experts Language Models

#1 Exploration-Driven Reinforcement Learning for Expert Routing Improvement in Mixture-of-Experts Language Models [PDF] [Copy] [Kimi] [REL]

The performance of MoE-based LLMs depends on the router’s ability to select suitable experts; however, the router is typically not explicitly supervised to acquire this routing ability. We propose Exploration-Driven Reinforcement Learning (ERL), which explicitly optimizes the router by exploration of alternative routing paths. For every input, ERL evaluates by (i) the original routing path and (ii) paths in which an 𝛼-fraction of routing decisions is randomly perturbed, and treats their performance gap as an advantage signal in a reinforcement learning. Moreover, MoE-ERLwPL mitigates the risk of performance collapse caused by routing reinforcement learning–induced expert over-specialization by intentionally enforcing overlap in experts’ knowledge. Without adding parameters or external reward models, our method improves summarization (SAMSum, XSUM), question answering (SQuAD), and language modeling (WikiText-2), and raises routing quality, delivering up to 8.9 × higher MRR than baselines over 100 perturbed routing paths. Code is available at our github.

Subject: EMNLP.2025 - Findings

2025.findings-emnlp.1282@ACL

#1 Exploration-Driven Reinforcement Learning for Expert Routing Improvement in Mixture-of-Experts Language Models [PDF] [Copy] [Kimi] [REL]