Adaptive Rank Selections for Low-Rank Approximation of Language Models

#1 Adaptive Rank Selections for Low-Rank Approximation of Language Models [PDF] [Copy] [Kimi] [REL]

Authors: Shangqian Gao ; Ting Hua ; Yen-Chang Hsu ; Yilin Shen ; Hongxia Jin

Singular Value Decomposition (SVD) or its weighted variants has significantly progressed in compressing language models. Previous works assume the same importance for all operations and assign the same number of ranks for different layers in a language model. However, such a uniform rank selection is sub-optimal since different operations (layers) have non-uniform demand in capacity. In other words, a desired SVD strategy should allocate more ranks for important operations and vice versa. However, a globally-optimized selection of ranks for neural networks is still an open problem, and this is a non-trivial challenge since the selection is discrete. In this work, we propose a novel binary masking mechanism for optimizing the number of ranks in a differentiable framework. Our strategy uses a novel regularization to enable the masking to comply with the SVD property where the ranks have sorted singular values. The experiments examined both types of language models, encoder-only and decoder-only models, including large language models like LLaMA. Our compressed model achieves much better accuracy than previous SVD and their SOTA variants. More interestingly, our method retains significantly better accuracy with zero or limited fine-tuning, proving the substantial advantage of adaptive rank selection.

2024.naacl-long.13@ACL

#1 Adaptive Rank Selections for Low-Rank Approximation of Language Models [PDF] [Copy] [Kimi] [REL]