LGR2: Language Guided Reward Relabeling for Accelerating Hierarchical Reinforcement Learning

#1 LGR2: Language Guided Reward Relabeling for Accelerating Hierarchical Reinforcement Learning [PDF²] [Copy] [Kimi¹] [REL]

Authors: Utsav Singh, Pramit Bhattacharyya, Vinay P. Namboodiri

Large language models (LLMs) have shown remarkable abilities in logical reasoning, in-context learning, and code generation. However, translating natural language instructions into effective robotic control policies remains a significant challenge, especially for tasks requiring long-horizon planning and operating under sparse reward conditions. Hierarchical Reinforcement Learning (HRL) provides a natural framework to address this challenge in robotics; however, it typically suffers from non-stationarity caused by the changing behavior of the lower-level policy during training, destabilizing higher-level policy learning. We introduce LGR2, a novel HRL framework that leverages LLMs to generate language-guided reward functions for the higher-level policy. By decoupling high-level reward generation from low-level policy changes, LGR2 fundamentally mitigates the non-stationarity problem in off-policy HRL, enabling stable and efficient learning. To further enhance sample efficiency in sparse environments, we integrate goal-conditioned hindsight experience relabeling. Extensive experiments across simulated and real-world robotic navigation and manipulation tasks demonstrate LGR2 outperforms both hierarchical and non-hierarchical baselines, achieving over 55% success rates on challenging tasks and robust transfer to real robots, without additional fine-tuning.

Subjects: Machine Learning , Computation and Language , Robotics

Publish: 2024-06-09 18:40:24 UTC

2406.05881

#1 LGR2: Language Guided Reward Relabeling for Accelerating Hierarchical Reinforcement Learning [PDF2] [Copy] [Kimi1] [REL]

#1 LGR2: Language Guided Reward Relabeling for Accelerating Hierarchical Reinforcement Learning [PDF²] [Copy] [Kimi¹] [REL]