41052@AAAI

Total: 1

#1 Beyond Patches: Mining Interpretable Part-Prototypes for Explainable AI [PDF] [Copy] [Kimi] [REL]

Authors: Mahdi Alehdaghi, Rajarshi Bhattacharya, Pourya Shamsolmoali, Rafael M. O. Cruz, Eric Granger

As AI systems become more capable, it is important that their decisions are understandable and aligned with human expectations. A key challenge is the lack of interpretability in deep models. Existing methods such as GradCAM generate heatmaps but provide limited conceptual insight, while prototype-based approaches offer example-based explanations but often rely on rigid region selection and lack semantic consistency. To address these limitations, we propose PCMNet, a Part-Prototypical Concept Mining Network that learns human-comprehensible prototypes from meaningful regions without extra supervision. By clustering these into concept groups and extracting concept activation vectors, PCMNet provides structured, concept-level explanations and enhances robustness under occlusion and adversarial conditions, which are both critical for building reliable and aligned AI systems. Experiments across multiple benchmarks show that PCMNet outperforms state-of-the-art methods in interpretability, stability, and robustness. This work contributes to AI alignment by enhancing transparency, controllability, and trustworthiness in modern AI systems.

Subject: AAAI.2026 - Special Track on AI Alignment