7v2loOdcLH@OpenReview

Total: 1

#1 A Lens into Interpretable Transformer Mistakes via Semantic Dependency [PDF] [Copy] [Kimi] [REL]

Authors: Ruo-Jing Dong, Yu Yao, Bo Han, Tongliang Liu

Semantic Dependency refers to the relationship between words in a sentence where the meaning of one word depends on another, which is important for natural language understanding.In this paper, we investigate the role of semantic dependencies in answering questions for transformer models, which is achieved by analyzing how token values shift in response to changes in semantics.Through extensive experiments on models including the BERT series, GPT, and LLaMA, we uncover the following key findings:1). Most tokens primarily retain their original semantic information even as they propagate through multiple layers.2). Models can encode truthful semantic dependencies in tokens in the final layer.3). Mistakes in model answers often stem from specific tokens encoded with incorrect semantic dependencies. Furthermore, we found that addressing the incorrectness by directly adjusting parameters is challenging because the same parameters can encode both correct and incorrect semantic dependencies depending on the context.Our findings provide insights into the causes of incorrect information generation in transformers and help the future development of robust and reliable models.

Subject: ICML.2025 - Poster