Assessing Reliability of Symbol Detection in Concept Bottleneck Models

#1 Assessing Reliability of Symbol Detection in Concept Bottleneck Models [PDF] [Copy] [Kimi] [REL]

Authors: Javier Fumanal-Idocin, Javier Andreu-Perez

Concept Bottleneck Models (CBMs) are a relevant tool for explainable Artificial Intelligence because they make their predictions through human-interpretable symbols. However, high task accuracy does not guarantee that these symbols are detected faithfully: jointly trained CBMs may encode task-specific shortcuts in the bottleneck, making their explanations unreliable. In this paper, we study concept-detection reliability by swapping independently trained concept detectors and classification heads that share the same symbolic vocabulary. We use the resulting performance degradation, concept-level metrics, and symbol-wise uncertainty estimates to identify concepts that are especially prone to spurious firing. Finally, we propose a reliability-aware training strategy in which a shared concept detector is optimized with multiple classification heads and penalized for relying on globally or instance-wise unreliable symbols. On CUB-200-2011 with full concept supervision, detectors and heads are almost freely interchangeable (swap drop below one accuracy point, relative retention above $99\%$, and no concept detected below chance), whereas on a controlled synthetic task we show that, as the concept-supervision weight is reduced, models keep near-perfect task accuracy while swapped accuracy and agreement with the ground-truth concepts collapse to chance. Our reliability-aware training substantially mitigates this leakage, roughly doubling swap accuracy in the leaky regime.

Subjects: Machine Learning , Computer Vision and Pattern Recognition , Symbolic Computation

Publish: 2026-06-15 10:38:49 UTC

2606.16535

#1 Assessing Reliability of Symbol Detection in Concept Bottleneck Models [PDF] [Copy] [Kimi] [REL]