Information Guided Regularization for Fine-tuning Language Models

#1 Information Guided Regularization for Fine-tuning Language Models [PDF¹] [Copy] [Kimi⁴] [REL]

Authors: Mandar Sharma, Nikhil Muralidhar, Shengzhe Xu, Raquib Bin Yousuf, Naren Ramakrishnan

The pretraining-fine-tuning paradigm has been the de facto strategy for transfer learning in modern language modeling. With the understanding that task adaptation in LMs is often a function of parameters shared across tasks, we argue that a more surgical approach to regularization needs to exist for smoother transfer learning. Towards this end, we investigate how the pretraining loss landscape is affected by these task-sensitive parameters through an information-theoretic lens. We then leverage the findings from our investigations to devise a novel approach to dropout for improved model regularization and better downstream generalization. This approach, named guided dropout, is both task & architecture agnostic and adds no computational overhead to the fine-tuning process. Through empirical evaluations, we showcase that our approach to regularization yields consistently better performance, even in scenarios of data paucity, compared to standardized baselines.

Subjects: Computation and Language , Artificial Intelligence , Machine Learning

Publish: 2024-06-20 05:18:37 UTC

2406.14005

#1 Information Guided Regularization for Fine-tuning Language Models [PDF1] [Copy] [Kimi4] [REL]

#1 Information Guided Regularization for Fine-tuning Language Models [PDF¹] [Copy] [Kimi⁴] [REL]