mzdR2i9XD2@OpenReview

Total: 1

#1 Parameter-Efficient and Stable Singular Value Adaptation for Pre-Trained Models [PDF4] [Copy] [Kimi] [REL]

Authors: Fanxu Meng, Muhan Zhang

We observe that the Q, K, O, and V matrices in attention layers can inherently be absorbed and decomposed into four head-wise orthogonal matrices and two sets of singular values without any loss. After orthogonalization, we freeze the singular vectors and fine-tune only the singular values, enabling stable fine-tuning constrained to the original latent space, which achieves a 5.4% improvement over LoRA across eight commonsense reasoning datasets. Additionally, this absorb-decompose operation eliminates redundant vectors losslessly, reducing the encoder parameters of Whisper-large-v3 by 46.42% when applied alone.

Subject: ICLR.2025 - Poster