Meta-Reinforcement Learning with Adaptation from Human Feedback via Preference-Order-Preserving Task Embedding

#1 Meta-Reinforcement Learning with Adaptation from Human Feedback via Preference-Order-Preserving Task Embedding [PDF¹] [Copy] [Kimi¹] [REL]

This paper studies meta-reinforcement learning with adaptation from human feedback. It aims to pre-train a meta-model that can achieve few-shot adaptation for new tasks from human preference queries without relying on reward signals. To solve the problem, we propose the framework *adaptation via Preference-Order-preserving EMbedding* (POEM). In the meta-training, the framework learns a task encoder, which maps tasks to a preference-order-preserving task embedding space, and a decoder, which maps the embeddings to the task-specific policies. In the adaptation from human feedback, the task encoder facilitates efficient task embedding inference for new tasks from the preference queries and then obtains the task-specific policy. We provide a theoretical guarantee for the convergence of the adaptation process to the task-specific optimal policy and experimentally demonstrate its state-of-the-art performance with substantial improvement over baseline methods.

Subject: ICML.2025 - Poster

PFMVVaPCn5@OpenReview

#1 Meta-Reinforcement Learning with Adaptation from Human Feedback via Preference-Order-Preserving Task Embedding [PDF1] [Copy] [Kimi1] [REL]

#1 Meta-Reinforcement Learning with Adaptation from Human Feedback via Preference-Order-Preserving Task Embedding [PDF¹] [Copy] [Kimi¹] [REL]