BAME: Block-Aware Mask Evolution for Efficient N:M Sparse Training

#1 BAME: Block-Aware Mask Evolution for Efficient N:M Sparse Training [PDF¹] [Copy] [Kimi] [REL]

Authors: Chenyi yang, Wenjie Nie, Yuxin Zhang, Yuhang Wu, Xiawu Zheng, GUANNAN JIANG, Rongrong Ji

N:M sparsity stands as a progressively important tool for DNN compression, achieving practical speedups by stipulating at most N non-zero components within M sequential weights. Unfortunately, most existing works identify the N:M sparse mask through dense backward propagation to update all weights, which incurs exorbitant training costs. In this paper, we introduce BAME, a method that maintains consistent sparsity throughout the N:M sparse training process. BAME perpetually keeps both sparse forward and backward propagation, while iteratively performing weight pruning-and-regrowing within designated weight blocks to tailor the N:M mask. These blocks are selected through a joint assessment based on accumulated mask oscillation frequency and expected loss reduction of mask adaptation, thereby ensuring stable and efficient identification of the optimal N:M mask. Our empirical results substantiate the effectiveness of BAME, illustrating it performs comparably to or better than previous works that fully maintaining dense backward propagation during training. For instance, BAME attains a 72.0% top-1 accuracy while training a 1:16 sparse ResNet-50 on ImageNet, eclipsing SR-STE by 0.5%, despite achieving 2.37 training FLOPs reduction. Code is released at \url{https://github.com/BAME-xmu/BAME}

Subject: ICML.2025 - Poster

XAZKPGUcQm@OpenReview

#1 BAME: Block-Aware Mask Evolution for Efficient N:M Sparse Training [PDF1] [Copy] [Kimi] [REL]

#1 BAME: Block-Aware Mask Evolution for Efficient N:M Sparse Training [PDF¹] [Copy] [Kimi] [REL]