Q-Supervised Contrastive Representation: A State Decoupling Framework for Safe Offline Reinforcement Learning

#1 Q-Supervised Contrastive Representation: A State Decoupling Framework for Safe Offline Reinforcement Learning [PDF] [Copy] [Kimi] [REL]

Authors: Zhihe Yang, Yunjian Xu, Yang Zhang

Safe offline reinforcement learning (RL), which aims to learn the safety-guaranteed policy without risky online interaction with environments, has attracted growing recent attention for safety-critical scenarios. However, existing approaches encounter out-of-distribution problems during the testing phase, which can result in potentially unsafe outcomes. This issue arises due to the infinite possible combinations of reward-related and cost-related states. In this work, we propose State Decoupling with Q-supervised Contrastive representation (SDQC), a novel framework that decouples the global observations into reward- and cost-related representations for decision-making, thereby improving the generalization capability for unfamiliar global observations. Compared with the classical representation learning methods, which typically require model-based estimation (e.g., bisimulation), we theoretically prove that our Q-supervised method generates a coarser representation while preserving the optimal policy, resulting in improved generalization performance. Experiments on DSRL benchmark problems provide compelling evidence that SDQC surpasses other baseline algorithms, especially for its exceptional ability to achieve almost zero violations in more than half of the tasks, while the state-of-the-art algorithm can only achieve the same level of success in a quarter of the tasks. Further, we demonstrate that SDQC possesses superior generalization ability when confronted with unseen environments.

Subject: ICML.2025 - Poster

hsoyRfvMGu@OpenReview

#1 Q-Supervised Contrastive Representation: A State Decoupling Framework for Safe Offline Reinforcement Learning [PDF] [Copy] [Kimi] [REL]