Not a Number: Identifying Instance Features for Capability-Oriented Evaluation

#1 Not a Number: Identifying Instance Features for Capability-Oriented Evaluation [PDF¹] [Copy] [Kimi] [REL]

Authors: Ryan Burnell, John Burden, Danaja Rutar, Konstantinos Voudouris, Lucy Cheke, José Hernández-Orallo

In AI evaluation, performance is often calculated by averaging across various instances. But to fully understand the capabilities of an AI system, we need to understand the factors that cause its pattern of success and failure. In this paper, we present a new methodology to identify and build informative instance features that can provide explanatory and predictive power to analyse the behaviour of AI systems more robustly. The methodology builds on these relevant features that should relate monotonically with success, and represents patterns of performance in a new type of plots known as ‘agent characteristic grids’. We illustrate this methodology with the Animal-AI competition as a representative example of how we can revisit existing competitions and benchmarks in AI—even when evaluation data is sparse. Agents with the same average performance can show very different patterns of performance at the instance level. With this methodology, these patterns can be visualised, explained and predicted, progressing towards a capability-oriented evaluation rather than relying on a less informative average performance score.

Subject: IJCAI.2022 - Machine Learning

392@2022@IJCAI

#1 Not a Number: Identifying Instance Features for Capability-Oriented Evaluation [PDF1] [Copy] [Kimi] [REL]

#1 Not a Number: Identifying Instance Features for Capability-Oriented Evaluation [PDF¹] [Copy] [Kimi] [REL]