Neutral Residues: Revisiting Adapters for Model Extension

#1 Neutral Residues: Revisiting Adapters for Model Extension [PDF¹] [Copy] [Kimi⁴] [REL]

Authors: Franck Signe Talla, Edouard Grave, Hervé Jégou

We address the problem of extending a pretrained large language model to a new domain that was not seen during training. Standard techniques, such as finetuning or low-rank adaptation (LoRA) are successful at domain adaptation, but do not formally add capacity to the model. This often leads to a trade-off, between performing well on the new domain vs. degrading performance on the original domain. Here, we revisit and improve adapters to extend LLMs from three angles: data, architecture and training procedure, which are advantageously considered jointly. The resulting method, called neutral residues, modifies adapters in a way that leads each new residual block to output near-zeros on the original domain. This solution leads to strong results when adapting a state-of-the-art model originally trained on English to a new language. Neutral residues significantly outperform competing approaches such as finetuning, LoRA or vanilla adapters in terms of the trade-off between learning the new language and not forgetting English.

Subjects: Computation and Language , Artificial Intelligence , Machine Learning

Publish: 2024-10-03 17:55:17 UTC

2410.02744

#1 Neutral Residues: Revisiting Adapters for Model Extension [PDF1] [Copy] [Kimi4] [REL]

#1 Neutral Residues: Revisiting Adapters for Model Extension [PDF¹] [Copy] [Kimi⁴] [REL]