5ryn8tYWHL@OpenReview

Total: 1

#1 Conservative Offline Goal-Conditioned Implicit V-Learning [PDF2] [Copy] [Kimi1] [REL]

Authors: Ke Kaiqiang, qian lin, Zongkai Liu, Shenghong He, Chao Yu

Offline goal-conditioned reinforcement learning (GCRL) learns a goal-conditioned value function to train policies for diverse goals with pre-collected datasets. Hindsight experience replay addresses the issue of sparse rewards by treating intermediate states as goals but fails to complete goal-stitching tasks where achieving goals requires stitching different trajectories. While cross-trajectory sampling is a potential solution that associates states and goals belonging to different trajectories, we demonstrate that this direct method degrades performance in goal-conditioned tasks due to the overestimation of values on unconnected pairs. To this end, we propose Conservative Goal-Conditioned Implicit Value Learning (CGCIVL), a novel algorithm that introduces a penalty term to penalize value estimation for unconnected state-goal pairs and leverages the quasimetric framework to accurately estimate values for connected pairs. Evaluations on OGBench, a benchmark for offline GCRL, demonstrate that CGCIVL consistently surpasses state-of-the-art methods across diverse tasks.

Subject: ICML.2025 - Poster