Perturbation is All You Need for Extrapolating Language Models

#1 Perturbation is All You Need for Extrapolating Language Models [PDF] [Copy] [Kimi¹] [REL]

Authors: Zetai Cen, Jin Zhu, Xinwei Shen, Chengchun Shi

We introduce a simple yet powerful framework for training large language models. In contrast to the standard autoregressive next-token prediction based on an exact prefix, we propose a perturbation-based procedure that first transforms the prefix into a semantic neighbor and then conditions on this perturbed variant for next-token prediction. This yields a hierarchical model with a pre-post-additive noise structure. Within this framework, we develop a rigorous theory of extrapolability, namely, the capacity of a model class to make reliable predictions for token sequences that lie outside the empirical support of the training corpus. We evaluate the finite-sample performance of the proposed procedure using both synthetic and real-world language data. Results show that the proposed method consistently improves out-of-support prediction while maintaining competitive in-support performance, demonstrating that perturbation offers a practical route to language modeling.

Subjects: Machine Learning , Machine Learning , Statistics Theory

Publish: 2026-05-05 23:03:33 UTC

2605.04344

#1 Perturbation is All You Need for Extrapolating Language Models [PDF] [Copy] [Kimi1] [REL]

#1 Perturbation is All You Need for Extrapolating Language Models [PDF] [Copy] [Kimi¹] [REL]