Reward Translation via Reward Machine in Semi-Alignable MDPs

#1 Reward Translation via Reward Machine in Semi-Alignable MDPs [PDF] [Copy] [Kimi] [REL]

Authors: Yun Hua, Haosheng Chen, Wenhao Li, Bo Jin, Baoxiang Wang, Hongyuan Zha, Xiangfeng Wang

Addressing reward design complexities in deep reinforcement learning is facilitated by knowledge transfer across different domains. To this end, we define \textit{reward translation} to describe the cross-domain reward transfer problem. However, current methods struggle with non-pairable and non-time-alignable incompatible MDPs.This paper presents an adaptable reward translation framework \textit{neural reward translation} featuring \textit{semi-alignable MDPs}, which allows efficient reward translation under relaxed constraints while handling the intricacies of incompatible MDPs. Given the inherent difficulty of directly mapping semi-alignable MDPs and transferring rewards, we introduce an indirect mapping method through reward machines, created using limited human input or LLM-based automated learning.Graph-matching techniques establish links between reward machines from distinct environments, thus enabling cross-domain reward translation within semi-alignable MDP settings. This broadens the applicability of DRL across multiple domains. Experiments substantiate our approach's effectiveness in tasks under environments with semi-alignable MDPs.

Subject: ICML.2025 - Poster

YLTfcWxVVj@OpenReview

#1 Reward Translation via Reward Machine in Semi-Alignable MDPs [PDF] [Copy] [Kimi] [REL]