2025.naacl-long.618@ACL

Total: 1

#1 SCIURus: Shared Circuits for Interpretable Uncertainty Representations in Language Models [PDF] [Copy] [Kimi] [REL]

Authors: Carter Teplica, Yixin Liu, Arman Cohan, Tim G. J. Rudner

We investigate the mechanistic sources of uncertainty in large language models (LLMs), an area with important implications for language model reliability and trustworthiness. To do so, we conduct a series of experiments designed to identify whether the factuality of generated responses and a model’s uncertainty originate in separate or shared circuits in the model architecture. We approach this question by adapting the well-established mechanistic interpretability techniques of causal tracing and zero-ablation to study the effect of different circuits on LLM generations. Our experiments on eight different models and five datasets, representing tasks predominantly requiring factual recall, provide strong evidence that a model’s uncertainty is produced in the same parts of the network that are responsible for the factuality of generated responses.

Subject: NAACL.2025 - Long Papers