Beyond No Regret: Instance-Dependent PAC Reinforcement Learning

#1 Beyond No Regret: Instance-Dependent PAC Reinforcement Learning [PDF¹] [Copy] [Kimi] [REL]

Authors: Andrew J Wagenmaker, Max Simchowitz, Kevin Jamieson

The theory of reinforcement learning has focused on two fundamental problems: achieving low regret, and identifying $\epsilon$ -optimal policies. While a simple reduction allows one to apply a low-regret algorithm to obtain an $\epsilon$ -optimal policy and achieve the worst-case optimal rate, it is unknown whether low-regret algorithms can obtain the instance-optimal rate for policy identification. We show this is not possibleâ€”there exists a fundamental tradeoff between achieving low regret and identifying an $\epsilon$ -optimal policy at the instance-optimal rate. Motivated by our negative finding, we propose a new measure of instance-dependent sample complexity for PAC tabular reinforcement learning which explicitly accounts for the attainable state visitation distributions in the underlying MDP. We then propose and analyze a novel, planning-based algorithm which attains this sample complexityâ€”yielding a complexity which scales with the suboptimality gaps and the â€œreachabilityâ€� of a state. We show our algorithm is nearly minimax optimal, and on several examples that our instance-dependent sample complexity offers significant improvements over worst-case bounds.

Subject: COLT.2022 - Accept

wagenmaker22a@v178@PMLR

#1 Beyond No Regret: Instance-Dependent PAC Reinforcement Learning [PDF1] [Copy] [Kimi] [REL]

#1 Beyond No Regret: Instance-Dependent PAC Reinforcement Learning [PDF¹] [Copy] [Kimi] [REL]