Cx5aNPycdO@OpenReview

Total: 1

#1 Learning Utilities from Demonstrations in Markov Decision Processes [PDF] [Copy] [Kimi] [REL]

Authors: Filippo Lazzati, Alberto Maria Metelli

Although it is well-known that humans commonly engage in *risk-sensitive* behaviors in the presence of stochasticity, most Inverse Reinforcement Learning (IRL) models assume a *risk-neutral* agent. As such, beyond $(i)$ introducing model misspecification, $(ii)$ they do not permit direct inference of the risk attitude of the observed agent, which can be useful in many applications. In this paper, we propose a novel model of behavior to cope with these issues. By allowing for risk sensitivity, our model alleviates $(i)$, and by explicitly representing risk attitudes through (learnable) *utility* functions, it solves $(ii)$. Then, we characterize the partial identifiability of an agent’s utility under the new model and note that demonstrations from multiple environments mitigate the problem. We devise two provably-efficient algorithms for learning utilities in a finite-data regime, and we conclude with some proof-of-concept experiments to validate *both* our model and our algorithms.

Subject: ICML.2025 - Poster