Total: 1
Ensuring that AI systems do what we, as humans, actually want them to do, is one of the biggest open research challenges in AI alignment and safety. My research seeks to directly address this challenge by enabling AI systems to interact with humans to learn aligned and robust behaviors. The way in which robots and other AI systems behave is often the result of optimizing a reward function. However, manually designing good reward functions is highly challenging and error prone, even for domain experts. Consider trying to write down a reward function that describes good driving behavior or how you like your bed made in the morning. While reward functions for these tasks are difficult to manually specify, human feedback in the form of demonstrations or preferences are often much easier to obtain. However, human data is often difficult to interpret, due to ambiguity and noise. Thus, it is critical that AI systems take into account epistemic uncertainty over the human's true intent. My talk will give an overview of my lab's progress along the following fundamental research areas: (1) efficiently maintaining uncertainty over human intent, (2) directly optimizing behavior to be robust to uncertainty over human intent, and (3) actively querying for additional human input to reduce uncertainty over human intent.