A Theoretical Study of (Hyper) Self-Attention through the Lens of Interactions: Representation, Training, Generalization

#1 A Theoretical Study of (Hyper) Self-Attention through the Lens of Interactions: Representation, Training, Generalization [PDF¹] [Copy] [Kimi¹] [REL]

Authors: Muhammed Ustaomeroglu, Guannan Qu

Self-attention has emerged as a core component of modern neural architectures, yet its theoretical underpinnings remain elusive. In this paper, we study self-attention through the lens of *interacting entities*, ranging from agents in multi-agent reinforcement learning to alleles in genetic sequences, and show that a single layer linear self-attention can *efficiently* represent, learn, and generalize functions capturing pairwise interactions, including out-of-distribution scenarios. Our analysis reveals that self-attention acts as a *mutual interaction learner* under minimal assumptions on the diversity of interaction patterns observed during training, thereby encompassing a wide variety of real-world domains. In addition, we validate our theoretical insights through experiments demonstrating that self-attention learns interaction functions and generalizes across both population distributions and out-of-distribution scenarios. Building on our theories, we introduce *HyperFeatureAttention*, a novel neural network module designed to learn couplings of different feature-level interactions between entities. Furthermore, we propose *HyperAttention*, a new module that extends beyond pairwise interactions to capture multi-entity dependencies, such as three-way, four-way, or general $n$-way interactions.

Subject: ICML.2025 - Poster

wQvR1LHboD@OpenReview

#1 A Theoretical Study of (Hyper) Self-Attention through the Lens of Interactions: Representation, Training, Generalization [PDF1] [Copy] [Kimi1] [REL]

#1 A Theoretical Study of (Hyper) Self-Attention through the Lens of Interactions: Representation, Training, Generalization [PDF¹] [Copy] [Kimi¹] [REL]