FLRC: Fine-grained Low-Rank Compressor for Efficient LLM Inference

#1 FLRC: Fine-grained Low-Rank Compressor for Efficient LLM Inference [PDF¹] [Copy] [Kimi] [REL]

Authors: Yu-Chen Lu, Chong-Yan Chen, Chi-Chih Chang, Yu-Fang Hu, Kai-Chiang Wu

Although large language models (LLM) have achieved remarkable performance, their enormous parameter counts hinder deployment on resource-constrained hardware. Low-rank compression can reduce both memory usage and computational demand, but applying a uniform compression ratio across all layers often leads to significant performance degradation, and previous methods perform poorly during decoding. To address these issues, we propose the Fine-grained Low-Rank Compressor (FLRC), which efficiently determines an optimal rank allocation for each layer, and incorporates progressive low-rank decoding to maintain text generation quality. Comprehensive experiments on diverse benchmarks demonstrate the superiority of FLRC, achieving up to a 17% improvement in ROUGE-L on summarization tasks compared to state-of-the-art low-rank compression methods, establishing a more robust and efficient framework to improve LLM inference.

Subject: EMNLP.2025 - Main

2025.emnlp-main.755@ACL

#1 FLRC: Fine-grained Low-Rank Compressor for Efficient LLM Inference [PDF1] [Copy] [Kimi] [REL]

#1 FLRC: Fine-grained Low-Rank Compressor for Efficient LLM Inference [PDF¹] [Copy] [Kimi] [REL]