Hu7hUjEMiW@OpenReview

Total: 1

#1 Automatic Reward Shaping from Confounded Offline Data [PDF] [Copy] [Kimi] [REL]

Authors: Mingxuan Li, Junzhe Zhang, Elias Bareinboim

Reward shaping has been demonstrated to be an effective technique for accelerating the learning process of reinforcement learning (RL) agents. While successful in empirical applications, the design of a good shaping function is less well understood in principle and thus often relies on domain expertise and manual design. To overcome this limitation, we propose a novel automated approach for designing reward functions from offline data, possibly contaminated with the unobserved confounding bias.We propose to use causal state value upper bounds calculated from offline datasets as a conservative optimistic estimation of the optimal state value, which is then used as state potentials in Potential-Based Reward Shaping (PBRS). When applying our shaping function to a model-free learner based on UCB principles, we show that it enjoys a better gap-dependent regret bound than the learner without shaping. To the best of our knowledge, this is the first gap-dependent regret bound for PBRS in model-free learning with online exploration.Simulations support the theoretical findings.

Subject: ICML.2025 - Poster