Beyond Neuron-Level Sparsity: Achieving Faithful and Interpretable LLMs with Mixture of Decoders

#1 Beyond Neuron-Level Sparsity: Achieving Faithful and Interpretable LLMs with Mixture of Decoders [PDF] [Copy] [Kimi¹] [REL]

As large language models (LLMs) scale, ensuring interpretability and privacy becomes critical. This talk addresses these interconnected challenges with novel approaches to model specialization and safety. First, we tackle the dense, distributed nature of LLM representations by casting Mixture-of-Experts (MoE) as a tensor decomposition, enabling specialized experts in a factorized space. Second, we argue that current neuron-level sparsity methods create a severe accuracy-sparsity trade-off, and we propose a paradigm shift to layer-level sparsity with the Mixture of Decoders (MxD). We explain how MxD uses tensor factorization to expand dense layers into thousands of specialized, full-rank sublayers, demonstrating how it significantly outperforms alternatives in preserving model faithfulness and performance across LLMs up to 3B parameters. Finally, we address privacy in open-weight models by proposing a scalable and certifiable algorithm that induces maximal uncertainty on protected instances, proving tight bounds that characterize the resulting privacy-utility tradeoff.

Subject: AAAI.2026 - New Faculty Highlights

41340@AAAI

#1 Beyond Neuron-Level Sparsity: Achieving Faithful and Interpretable LLMs with Mixture of Decoders [PDF] [Copy] [Kimi1] [REL]

#1 Beyond Neuron-Level Sparsity: Achieving Faithful and Interpretable LLMs with Mixture of Decoders [PDF] [Copy] [Kimi¹] [REL]