2511.11553

Total: 1

#1 Multistability of Self-Attention Dynamics in Transformers [PDF] [Copy] [Kimi2] [REL]

Author: Claudio Altafini

In machine learning, a self-attention dynamics is a continuous-time multiagent-like model of the attention mechanisms of transformers. In this paper we show that such dynamics is related to a multiagent version of the Oja flow, a dynamical system that computes the principal eigenvector of a matrix corresponding for transformers to the value matrix. We classify the equilibria of the ``single-head'' self-attention system into four classes: consensus, bipartite consensus, clustering and polygonal equilibria. Multiple asymptotically stable equilibria from the first three classes often coexist in the self-attention dynamics. Interestingly, equilibria from the first two classes are always aligned with the eigenvectors of the value matrix, often but not exclusively with the principal eigenvector.

Subjects: Machine Learning , Systems and Control , Dynamical Systems

Publish: 2025-11-14 18:45:22 UTC