Total: 1
We observe that the Q, K, O, and V matrices in attention layers can inherently be absorbed and decomposed into four head-wise orthogonal matrices and two sets of singular values without any loss. After orthogonalization, we freeze the singular vectors and fine-tune only the singular values, enabling stable fine-tuning constrained to the original latent space, which achieves a 5.4% improvement over LoRA across eight commonsense reasoning datasets. Additionally, this absorb-decompose operation eliminates redundant vectors losslessly, reducing the encoder parameters of Whisper-large-v3 by 46.42% when applied alone.