QtbyoRxyNx@OpenReview

Total: 1

#1 "Who experiences large model decay and why?" A Hierarchical Framework for Diagnosing Heterogeneous Performance Drift [PDF] [Copy] [Kimi] [REL]

Authors: Harvineet Singh, Fan Xia, Alexej Gossmann, Andrew Chuang, Julian Hong, Jean Feng

Machine learning (ML) models frequently experience performance degradation when deployed in new contexts. Such degradation is rarely uniform: some subgroups may suffer large performance decay while others may not. Understanding where and how large differences in performance arise is critical for designing *targeted* corrective actions that mitigate decay for the most affected subgroups while minimizing any unintended effects. Current approaches do not provide such detailed insight, as they either (i) explain how *average* performance shifts arise or (ii) identify adversely affected subgroups without insight into how this occurred. To this end, we introduce a **S**ubgroup-scanning **H**ierarchical **I**nference **F**ramework for performance drif**T** (SHIFT). SHIFT first asks "Is there any subgroup with unacceptably large performance decay due to covariate/outcome shifts?" (*Where?*) and, if so, dives deeper to ask "Can we explain this using more detailed variable(subset)-specific shifts?" (*How?*). In real-world experiments, we find that SHIFT identifies interpretable subgroups affected by performance decay, and suggests targeted actions that effectively mitigate the decay.

Subject: ICML.2025 - Poster