2502.14671

Total: 1

#1 Explanations of Large Language Models Explain Language Representations in the Brain [PDF1] [Copy] [Kimi6] [REL]

Authors: Maryam Rahimi, Yadollah Yaghoobzadeh, Mohammad Reza Daliri

Large language models (LLMs) not only exhibit human-like performance but also share computational principles with the brain's language processing mechanisms. While prior research has focused on mapping LLMs' internal representations to neural activity, we propose a novel approach using explainable AI (XAI) to strengthen this link. Applying attribution methods, we quantify the influence of preceding words on LLMs' next-word predictions and use these explanations to predict fMRI data from participants listening to narratives. We find that attribution methods robustly predict brain activity across the language network, revealing a hierarchical pattern: explanations from early layers align with the brain's initial language processing stages, while later layers correspond to more advanced stages. Additionally, layers with greater influence on next-word prediction$\unicode{x2014}$reflected in higher attribution scores$\unicode{x2014}$demonstrate stronger brain alignment. These results underscore XAI's potential for exploring the neural basis of language and suggest brain alignment for assessing the biological plausibility of explanation methods.

Subjects: Computation and Language , Artificial Intelligence , Neurons and Cognition

Publish: 2025-02-20 16:05:45 UTC