Efficient Discovery of Approximate Causal Abstractions via Neural Mechanism Sparsification

#1 Efficient Discovery of Approximate Causal Abstractions via Neural Mechanism Sparsification [PDF] [Copy] [Kimi¹] [REL]

Neural networks are hypothesized to implement interpretable causal mechanisms, yet verifying this requires finding a causal abstraction -- a simpler, high-level Structural Causal Model (SCM) faithful to the network under interventions. Discovering such abstractions is hard: it typically demands brute-force interchange interventions or retraining. We reframe the problem by viewing structured pruning as a search over approximate abstractions. Treating a trained network as a deterministic SCM, we derive an Interventional Risk objective whose second-order expansion yields closed-form criteria for replacing units with constants or folding them into neighbors. Under uniform curvature, our score reduces to activation variance, recovering variance-based pruning as a special case while clarifying when it fails. The resulting procedure efficiently extracts sparse, intervention-faithful abstractions from pretrained networks, which we validate via interchange interventions.

Subjects: Machine Learning , Artificial Intelligence

Publish: 2026-02-27 18:35:10 UTC

2602.24266

#1 Efficient Discovery of Approximate Causal Abstractions via Neural Mechanism Sparsification [PDF] [Copy] [Kimi1] [REL]

#1 Efficient Discovery of Approximate Causal Abstractions via Neural Mechanism Sparsification [PDF] [Copy] [Kimi¹] [REL]