Accelerating Dense LLMs via L0-regularized Mixture-of-Experts

2025.acl-short.39@ACL

Total: 1

#1 Accelerating Dense LLMs via L0-regularized Mixture-of-Experts [PDF²] [Copy] [Kimi¹] [REL]

Authors: Zhenyu Zhang, JiuDong Yang, Taozhaowen Taozhaowen, Meng Chen

Large language models (LLMs) achieve strong performance but suffer from slow and costly inference. Existing acceleration methods often lead to noticeable performance degradation, while Mixture-of-Experts (MoE) models require extensive computational resources. In this paper, we propose L0-MoE, a lightweight MoE approach using L0-regularization to accelerate dense LLMs nearly without performance loss. Our method introduces a cluster confusion matrix for domain-aware dataset curation and applies dynamic batching for efficient training. Experiments show that L0-MoE achieves up to 2.5x speedup over dense models while maintaining competitive performance, outperforming existing LLM acceleration baselines.

Subject: ACL.2025 - Short Papers

2025.acl-short.39@ACL

#1 Accelerating Dense LLMs via L0-regularized Mixture-of-Experts [PDF2] [Copy] [Kimi1] [REL]

#1 Accelerating Dense LLMs via L0-regularized Mixture-of-Experts [PDF²] [Copy] [Kimi¹] [REL]