Beyond Scalar Rewards: An Axiomatic Framework for Lexicographic MDPs

#1 Beyond Scalar Rewards: An Axiomatic Framework for Lexicographic MDPs [PDF] [Copy] [Kimi] [REL]

Authors: Mehran Shakerinava, Siamak Ravanbakhsh, Adam Oberman

Recent work has formalized the reward hypothesis through the lens of expected utility theory, by interpreting reward as utility. Hausner's foundational work showed that dropping the continuity axiom leads to a generalization of expected utility theory where utilities are lexicographically ordered vectors of arbitrary dimension. In this paper, we extend this result by identifying a simple and practical condition under which preferences in a Markov Decision Process (MDP) cannot be represented by scalar rewards, necessitating a 2-dimensional reward function. We provide a full characterization of such reward functions, as well as the general d-dimensional case under a memorylessness assumption on preferences. Furthermore, we show that optimal policies in this setting retain many desirable properties of their scalar-reward counterparts, while in the Constrained MDP (CMDP) setting — another common multiobjective setting — they do not.

Subject: NeurIPS.2025 - Spotlight

cUy3tYIOS5@OpenReview

#1 Beyond Scalar Rewards: An Axiomatic Framework for Lexicographic MDPs [PDF] [Copy] [Kimi] [REL]