Evaluating Generalization and Representation Stability in Small LMs via Prompting

#1 Evaluating Generalization and Representation Stability in Small LMs via Prompting [PDF] [Copy] [Kimi¹] [REL]

We investigate the generalization capabilities of small language models under two popular adaptation paradigms: few-shot prompting and supervised fine-tuning. While prompting is often favored for its parameter efficiency and flexibility, it remains unclear how robust this approach is in low-resource settings and under distributional shifts. This paper presents a comparative study of prompting and fine-tuning across task formats, prompt styles, and model scales, with a focus on their behavior in both in-distribution and out-of-distribution (OOD) settings. Beyond accuracy, we analyze the internal representations learned by each approach to assess the stability and abstraction of task-specific features. Our findings highlight critical differences in how small models internalize and generalize knowledge under different adaptation strategies. This work offers practical guidance for model selection in low-data regimes and contributes empirical insight into the ongoing debate over prompting versus fine-tuning. Code for the experiments is available at the following

Subjects: Artificial Intelligence , Machine Learning

Publish: 2025-06-16 01:44:26 UTC

2506.17289

#1 Evaluating Generalization and Representation Stability in Small LMs via Prompting [PDF] [Copy] [Kimi1] [REL]

#1 Evaluating Generalization and Representation Stability in Small LMs via Prompting [PDF] [Copy] [Kimi¹] [REL]