Distilling Reinforcement Learning Algorithms for In-Context Model-Based Planning

#1 Distilling Reinforcement Learning Algorithms for In-Context Model-Based Planning [PDF] [Copy] [Kimi] [REL]

Authors: Jaehyeon Son, Soochan Lee, Gunhee Kim

Recent studies have demonstrated that Transformers can perform in-context reinforcement learning (RL) by imitating a source RL algorithm. This enables them to adapt to new tasks in a sample-efficient manner without parameter updates. However, since the Transformers are trained to mimic the source algorithm, they also reproduce its suboptimal behaviors. Model-based planning offers a promising solution to this limitation by allowing the agents to simulate potential outcomes before taking action, providing an additional mechanism to deviate from the source algorithm's behavior. Rather than learning a separate dynamics model, we propose Distillation for In-Context Planning (DICP), an in-context model-based RL framework where the Transformer simultaneously learns environment dynamics and improves policy in-context. With experiments across a diverse set of discrete and continuous environments such as Darkroom variants and Meta-World, we show that this method achieves state-of-the-art performance, requiring significantly fewer environmental interactions than the baselines including both in-context model-free counterparts and existing meta-RL methods.

Subject: ICLR.2025 - Poster

BfUugGfBE5@OpenReview

#1 Distilling Reinforcement Learning Algorithms for In-Context Model-Based Planning [PDF] [Copy] [Kimi] [REL]