Total: 1
Existing context-based offline meta-reinforcement learning (COMRL) methods primarily focus on task representation learning and given-context adaptation performance. They often assume that the adaptation context is collected using task-specific behavior policies or through multiple rounds of collection. However, in real applications, the context should be collected by a policy in a one-shot manner to ensure efficiency and safety. We find that intrinsic context ambiguity across multiple tasks and out-of-distribution (OOD) issues due to distribution shift significantly affect the performance of one-shot adaptation, which has been largely overlooked in most COMRL research. To address this problem, we propose using heteroscedastic uncertainty in representation learning to identify ambiguous and OOD contexts, and train an uncertainty-aware context collecting policy for effective one-shot online adaptation. The proposed method can be integrated into various COMRL frameworks, including classifier-based, reconstrution-based and contrastive learning-based approaches. Empirical evaluations on benchmark tasks show that our method can improve one-shot adaptation performance by up to 36% and zero-shot adaptation performance by up to 34% compared to existing baseline COMRL methods.