Total: 1
As AI systems become more capable, it is important that their decisions are understandable and aligned with human expectations. A key challenge is the lack of interpretability in deep models. Existing methods such as GradCAM generate heatmaps but provide limited conceptual insight, while prototype-based approaches offer example-based explanations but often rely on rigid region selection and lack semantic consistency. To address these limitations, we propose PCMNet, a Part-Prototypical Concept Mining Network that learns human-comprehensible prototypes from meaningful regions without extra supervision. By clustering these into concept groups and extracting concept activation vectors, PCMNet provides structured, concept-level explanations and enhances robustness under occlusion and adversarial conditions, which are both critical for building reliable and aligned AI systems. Experiments across multiple benchmarks show that PCMNet outperforms state-of-the-art methods in interpretability, stability, and robustness. This work contributes to AI alignment by enhancing transparency, controllability, and trustworthiness in modern AI systems.