Quantile of Means: A Bonus-Free Ensemble Method for Minimax Optimal Reinforcement Learning

2606.20107

Total: 1

#1 Quantile of Means: A Bonus-Free Ensemble Method for Minimax Optimal Reinforcement Learning [PDF] [Copy] [Kimi] [REL]

Authors: Asaf Cassel, Aviv Rosenberg

Optimal Reinforcement Learning (RL) algorithms typically rely on carefully constructed count-based uncertainty estimates to drive exploration. Although theoretically sound, such estimates are hard to compute in practical settings and therefore offer limited insight for designing exploration heuristics. Meanwhile, ensembling has emerged as a practical approach, but remains without theoretical justification. Building on a recent ensemble-based method for Multi-Armed Bandits, we propose a quantile-based ensemble method for finite-horizon Markov Decision Processes (MDPs). Our simple count-free approach achieves optimal variance-dependent regret bounds, providing theoretical grounding for ensemble-based exploration in RL.

Subject: Machine Learning

Publish: 2026-06-18 11:30:59 UTC