Preference Controllable Reinforcement Learning with Advanced Multi-Objective Optimization

#1 Preference Controllable Reinforcement Learning with Advanced Multi-Objective Optimization [PDF¹] [Copy] [Kimi] [REL]

Authors: Yucheng Yang, Tianyi Zhou, Mykola Pechenizkiy, Meng Fang

Practical reinforcement learning (RL) usually requires agents to be optimized for multiple potentially conflicting criteria, e.g. speed vs. safety. Although Multi-Objective RL (MORL) algorithms have been studied in previous works, their trained agents often cover limited Pareto optimal solutions and they lack precise controllability of the delicate trade-off among multiple objectives. Hence, the resulting agent is not versatile in aligning with customized requests from different users. To bridge the gap, we develop the ``Preference controllable (PC) RL'' framework, which trains a preference-conditioned meta-policy that takes user preference as input controlling the generated trajectories within the preference region on the Pareto frontier. The PCRL framework is compatible with advanced Multi-Objective Optimization~(MOO) algorithms that are rarely seen in previous MORL approaches. We also proposed a novel preference-regularized MOO algorithm specifically for PCRL. We provide a comprehensive theoretical analysis to justify its convergence and preference controllability.We evaluate PCRL with different MOO algorithms against state-of-the-art MORL baselines in various challenging environments with up to six objectives. In these experiments, our proposed method exhibits significantly better controllability than existing approaches and can generate Pareto solutions with better diversity and utilities.

Subject: ICML.2025 - Poster

49g4c8MWHy@OpenReview

#1 Preference Controllable Reinforcement Learning with Advanced Multi-Objective Optimization [PDF1] [Copy] [Kimi] [REL]

#1 Preference Controllable Reinforcement Learning with Advanced Multi-Objective Optimization [PDF¹] [Copy] [Kimi] [REL]