Processing math: 100%

wei21b@v134@PMLR

Total: 1

#1 Non-stationary Reinforcement Learning without Prior Knowledge: an Optimal Black-box Approach [PDF5] [Copy] [Kimi8] [REL]

Authors: Chen-Yu Wei, Haipeng Luo

We propose a black-box reduction that turns a certain reinforcement learning algorithm with optimal regret in a (near-)stationary environment into another algorithm with optimal dynamic regret in a non-stationary environment, importantly without any prior knowledge on the degree of non-stationarity. By plugging different algorithms into our black-box, we provide a list of examples showing that our approach not only recovers recent results for (contextual) multi-armed bandits achieved by very specialized algorithms, but also significantly improves the state of the art for (generalzed) linear bandits, episodic MDPs, and infinite-horizon MDPs in various ways. Specifically, in most cases our algorithm achieves the optimal dynamic regret ˜O(min{LT,Δ13T23}) where T is the number of rounds and L and Δ are the number and amount of changes of the world respectively, while previous works only obtain suboptimal bounds and/or require the knowledge of L and Δ.

Subject: COLT.2021 - Award