The Shape of Wisdom: Decision Trajectories in Language Models

#1 The Shape of Wisdom: Decision Trajectories in Language Models [PDF] [Copy] [Kimi] [REL]

Language models do not simply choose an answer at the output layer. In a 9,000-trajectory MMLU study across Qwen2.5-7B-Instruct, Llama-3.1-8B-Instruct, and Mistral-7B-Instruct-v0.3, the score of the answer moves across depth in structured ways. We describe each trajectory with three quantities: the current answer margin, the next-layer change in that margin, and the distance from a decision flip. The main empirical picture is that correctness and stability are different: the largest group is unstable-correct, not stable-correct. A traced subset then asks what moves the margin. In stable-correct cases, the average attention scalar points in the correct direction, while the average MLP scalar does not; span deletion shows that removing answer-supporting text hurts the margin and removing distractor-like text helps it. The result is not a full circuit explanation. It is a reproducible way to see which answers are settled, which remain fragile, and which measured sources move them.

Subjects: Artificial Intelligence , Computation and Language , Machine Learning

Publish: 2026-05-31 12:33:36 UTC

2606.01202

#1 The Shape of Wisdom: Decision Trajectories in Language Models [PDF] [Copy] [Kimi] [REL]