When Maximum Entropy Misleads Policy Optimization

FRFuvBRueA@OpenReview

Total: 1

#1 When Maximum Entropy Misleads Policy Optimization [PDF³] [Copy] [Kimi] [REL]

Authors: Ruipeng Zhang, Ya-Chien Chang, Sicun Gao

The Maximum Entropy Reinforcement Learning (MaxEnt RL) framework is a leading approach for achieving efficient learning and robust performance across many RL tasks. However, MaxEnt methods have also been shown to struggle with performance-critical control problems in practice, where non-MaxEnt algorithms can successfully learn. In this work, we analyze how the trade-off between robustness and optimality affects the performance of MaxEnt algorithms in complex control tasks: while entropy maximization enhances exploration and robustness, it can also mislead policy optimization, leading to failure in tasks that require precise, low-entropy policies. Through experiments on a variety of control problems, we concretely demonstrate this misleading effect. Our analysis leads to better understanding of how to balance reward design and entropy maximization in challenging control problems.

Subject: ICML.2025 - Poster

FRFuvBRueA@OpenReview

#1 When Maximum Entropy Misleads Policy Optimization [PDF3] [Copy] [Kimi] [REL]

#1 When Maximum Entropy Misleads Policy Optimization [PDF³] [Copy] [Kimi] [REL]