Goal Alignment: Re-analyzing Value Alignment Problems Using Human-Aware AI

#1 Goal Alignment: Re-analyzing Value Alignment Problems Using Human-Aware AI [PDF³] [Copy] [Kimi⁵] [REL]

Authors: Malek Mechergui, Sarath Sreedharan

While the question of misspecified objectives has gotten much attention in recent years, most works in this area primarily focus on the challenges related to the complexity of the objective specification mechanism (for example, the use of reward functions). However, the complexity of the objective specification mechanism is just one of many reasons why the user may have misspecified their objective. A foundational cause for misspecification that is being overlooked by these works is the inherent asymmetry in human expectations about the agent's behavior and the behavior generated by the agent for the specified objective. To address this, we propose a novel formulation for the objective misspecification problem that builds on the human-aware planning literature, which was originally introduced to support explanation and explicable behavioral generation. Additionally, we propose a first-of-its-kind interactive algorithm that is capable of using information generated under incorrect beliefs about the agent to determine the true underlying goal of the user.

Subject: AAAI.2024 - Humans and AI

28875@AAAI

#1 Goal Alignment: Re-analyzing Value Alignment Problems Using Human-Aware AI [PDF3] [Copy] [Kimi5] [REL]

#1 Goal Alignment: Re-analyzing Value Alignment Problems Using Human-Aware AI [PDF³] [Copy] [Kimi⁵] [REL]