Learning Robust Representation for Reinforcement Learning with Distractions by Reward Sequence Prediction

#1 Learning Robust Representation for Reinforcement Learning with Distractions by Reward Sequence Prediction [PDF] [Copy] [Kimi] [REL]

Authors: Qi Zhou, Jie Wang, Qiyuan Liu, Yufei Kuang, Wengang Zhou, Houqiang Li Our method learns robust representations by predicting reward sequences via a novel TD-style algorithm, achieving state-of-the-art sample efficiency and generalization in environments with distractions.

Reinforcement learning algorithms have achieved impressive success in learning behaviors from pixels. However, their application to real-world tasks remains challenging because of their sensitivity to visual distractions (e.g., changes in viewpoint and light). A major reason is that the learned representations often suffer from overfitting task-irrelevant information. By comparing several representation learning methods, we find that the key to robust representation learning is the choice of prediction targets. Therefore, we propose a novel representation learning approach---namely, Reward Sequence Prediction (RSP)---that uses reward sequences or their transforms (e.g., discrete time Fourier transform) as prediction targets. RSP can learn robust representations efficiently because reward sequences rarely contain task-irrelevant information while providing sufficient supervised signals to accelerate representation learning. An appealing feature is that RSP makes no assumption about the type of distractions and thus can improve performance even when multiple types of distractions exist. We evaluate our approach in Distracting Control Suite. Experiments show that our method achieves state-of-the-art sample efficiency and generalization ability in tasks with distractions.

Subject: UAI.2023 - Accept

zhou23a@v216@PMLR

#1 Learning Robust Representation for Reinforcement Learning with Distractions by Reward Sequence Prediction [PDF] [Copy] [Kimi] [REL]