5OLRHkzTYk@OpenReview

Total: 1

#1 Pivoting Factorization: A Compact Meta Low-Rank Representation of Sparsity for Efficient Inference in Large Language Models [PDF4] [Copy] [Kimi] [REL]

Authors: Jialin Zhao, Yingtao Zhang, Carlo Cannistraci

The rapid growth of Large Language Models has driven demand for effective model compression techniques to reduce memory and computation costs. Low-rank pruning has gained attention for its GPU compatibility across all densities. However, low-rank pruning struggles to match the performance of semi-structured pruning, often doubling perplexity at similar densities. In this paper, we propose **Pi**voting **Fa**ctorization (**PIFA**), a novel **lossless** meta low-rank representation that unsupervisedly learns a **compact** form of any low-rank representation, effectively eliminating redundant information. PIFA identifies pivot rows (linearly independent rows) and expresses non-pivot rows as linear combinations, achieving **24.2\%** additional memory savings and **24.6\%** faster inference over low-rank layers at rank = 50\% of dimension. To mitigate the performance degradation caused by low-rank pruning, we introduce a novel, retraining-free reconstruction method that **m**inimizes error accumulation (**M**). **MPIFA**, combining M and PIFA into an end-to-end framework, significantly outperforms existing low-rank pruning methods, and achieves performance comparable to semi-structured pruning, while surpassing it in GPU efficiency and compatibility. Our code is available at https://github.com/biomedical-cybernetics/pivoting-factorization.

Subject: ICML.2025 - Poster