NoWag: A Unified Framework for Shape Preserving Com- pression of Large Language Models

#1 NoWag: A Unified Framework for Shape Preserving Com- pression of Large Language Models [PDF³] [Copy] [Kimi⁹] [REL]

Authors: Lawrence Ray Liu, Inesh Chakrabarti, Yixiao Li, Mengdi Wang, Tuo Zhao, Lin Yang

Large language models (LLMs) exhibit remarkable performance across various natural language processing tasks but suffer from immense computational and memory demands, limiting their deployment in resource-constrained environments. To address this challenge, we propose NoWag (Normalized Weight and Activation Guided Compression), a unified framework for one-shot shape preserving compression algorithms. We apply NoWag to compress Llama-2 (7B, 13B, 70B) and Llama-3 (8B, 70B) models using two popular shape-preserving techniques: vector quantization (NoWag-VQ) and unstructured/semi-structured pruning (NoWag-P). Our results show that NoWag-VQ significantly outperforms state-of-the-art one-shot vector quantization methods, while NoWag-P performs competitively against leading pruning techniques. These findings highlight underlying commonalities between these compression paradigms and suggest promising directions for future research. Our code is available at https://github.com/LawrenceRLiu/NoWag

Subject: COLM.2025

EfTuzTijDo@OpenReview

#1 NoWag: A Unified Framework for Shape Preserving Com- pression of Large Language Models [PDF3] [Copy] [Kimi9] [REL]

#1 NoWag: A Unified Framework for Shape Preserving Com- pression of Large Language Models [PDF³] [Copy] [Kimi⁹] [REL]