Total: 1
One-shot post-training pruning enhances the deployment of billion-scale large language models (LLMs), with the pruning metric playing a pivotal role in determining which weights to remove. However, existing metrics underperform due to their reliance on a simple symbolic combination of weights and activations, overlooking imbalanced weight magnitudes and the disproportionate influence of activation outliers.To overcome these limitations, we introduce \textbf{BaWA}, a novel pruning metric that systematically \underline{Ba}lances \underline{W}eight and \underline{A}ctivation distributions for more effective pruning.BaWA introduces two key innovations: \textbf{magnitude normalization}, which mitigates weight imbalance across channels for fairer pruning decisions, and \textbf{outlier regularization}, which reduces the impact of activation outliers, ensuring more appropriate channel prioritization. To further enhance its effectiveness, BaWA incorporates an efficient and automatic framework for optimizing normalization and regularization hyperparameters. Extensive experiments validate BaWA as a state-of-the-art (SOTA) pruning metric. For instance, applying BaWA to induce 2:4 sparsity in Mistral-7B reduces perplexity in language comprehension by 2.49 and improves average downstream task accuracy by 3.08\%, outperforming the previous SOTA method Wanda.