Boosting Parameter Efficiency in LLM-Based Recommendation through Sophisticated Pruning

#1 Boosting Parameter Efficiency in LLM-Based Recommendation through Sophisticated Pruning [PDF⁴] [Copy] [Kimi³] [REL]

Authors: Shanle Zheng, Keqin Bao, Jizhi Zhang, Yang Zhang, Fuli Feng, Xiangnan He

LLM-based recommender systems have made significant progress; however, the deployment cost associated with the large parameter volume of LLMs still hinders their real-world applications. This work explores parameter pruning to improve parameter efficiency while maintaining recommendation quality, thereby enabling easier deployment. Unlike existing approaches that focus primarily on inter-layer redundancy, we uncover intra-layer redundancy within components such as self-attention and MLP modules. Building on this analysis, we propose a more fine-grained pruning approach that integrates both intra-layer and layer-wise pruning. Specifically, we introduce a three-stage pruning strategy that progressively prunes parameters at different levels and parts of the model, moving from intra-layer to layer-wise pruning, or from width to depth. Each stage also includes a performance restoration step using distillation techniques, helping to strike a balance between performance and parameter efficiency. Empirical results demonstrate the effectiveness of our approach: across three datasets, our models achieve an average of 88% of the original model's performance while pruning more than 95% of the non-embedding parameters. This underscores the potential of our method to significantly reduce resource requirements without greatly compromising recommendation quality. Our code will be available at: https://github.com/zheng-sl/PruneRec

Subject: Information Retrieval

Publish: 2025-07-09 17:26:10 UTC

2507.07064

#1 Boosting Parameter Efficiency in LLM-Based Recommendation through Sophisticated Pruning [PDF4] [Copy] [Kimi3] [REL]

#1 Boosting Parameter Efficiency in LLM-Based Recommendation through Sophisticated Pruning [PDF⁴] [Copy] [Kimi³] [REL]