FDruZlKWUb@OpenReview

Total: 1

#1 Tracing the Representation Geometry of Language Models from Pretraining to Post-training [PDF2] [Copy] [Kimi3] [REL]

Authors: Melody Zixuan Li, Kumar Krishna Agrawal, Arna Ghosh, Komal Kumar Teru, Adam Santoro, Guillaume Lajoie, Blake Aaron Richards

Standard training metrics like loss fail to explain the emergence of complex capabilities in large language models. We take a spectral approach to investigate the geometry of learned representations across pretraining and post-training, measuring effective rank (RankMe) and eigenspectrum decay (αReQ). With OLMo (1B-7B) and Pythia (160M-12B) models, we uncover a consistent non-monotonic sequence of three geometric phases during autoregressive pretraining. The initial “warmup” phase exhibits rapid representational collapse. This is followed by an “entropy-seeking” phase, where the manifold’s dimensionality expands substantially, coinciding with peak n-gram memorization. Subsequently, a “compression-seeking” phase imposes anisotropic consolidation, selectively preserving variance along dominant eigendirections while contracting others, a transition marked with significant improvement in downstream task performance. We show these phases can emerge from a fundamental interplay of cross-entropy optimization under skewed token frequencies and representational bottlenecks (d ≪ |V|). Post-training further transforms geometry: SFT and DPO drive “entropy-seeking” dynamics to integrate specific instructional or preferential data, improving in-distribution performance while degrading out-of-distribution robustness. Conversely, RLVR induces “compression-seeking” , enhancing reward alignment but reducing generation diversity.

Subject: NeurIPS.2025 - Poster