NormXLogit: The Head-on-Top Never Lies

#1 NormXLogit: The Head-on-Top Never Lies [PDF] [Copy] [Kimi] [REL]

Authors: Sina Abbasi, Mohammad Reza Modarres, Mohammad Taher Pilehvar

With new large language models (LLMs) emerging frequently, it is important to consider the potential value of model-agnostic approaches that can provide interpretability across a variety of architectures. While recent advances in LLM interpretability show promise, many rely on complex, model-specific methods with high computational costs. To address these limitations, we propose NormXLogit, a novel technique for assessing the significance of individual input tokens. This method operates based on the input and output representations associated with each token. First, we demonstrate that the norm of word embeddings can be utilized as a measure of token importance. Second, we reveal a significant relationship between a token’s importance and how predictive its representation is of the model’s final output. Extensive analyses indicate that our approach outperforms existing gradient-based methods in terms of faithfulness and offers competitive performance compared to leading architecture-specific techniques.

Subject: EMNLP.2025 - Main

2025.emnlp-main.1769@ACL

#1 NormXLogit: The Head-on-Top Never Lies [PDF] [Copy] [Kimi] [REL]