Split-Merge: Scalable and Memory-Efficient Merging of Expert LLMs

#1 Split-Merge: Scalable and Memory-Efficient Merging of Expert LLMs [PDF] [Copy] [Kimi] [REL]

Authors: Sruthi Gorantla, Aditya Rawal, Devamanyu Hazarika, Kaixiang Lin, Mingyi Hong, Mahdi Namazifar

We introduce a zero-shot merging framework for large language models (LLMs) that consolidates specialized domain experts into a single model without any further training. Our core contribution lies in leveraging relative task vectors—difference representations encoding each expert’s unique traits with respect to a shared base model—to guide a principled and efficient merging process. By dissecting parameters into common dimensions (averaged across experts) and complementary dimensions (unique to each expert), we strike an optimal balance between generalization and specialization. We further devise a compression mechanism for the complementary parameters, retaining only principal components and scalar multipliers per expert, thereby minimizing overhead. A dynamic router then selects the most relevant domain at inference, ensuring that domain-specific precision is preserved. Experiments on code generation, mathematical reasoning, medical question answering, and instruction-following benchmarks confirm the versatility and effectiveness of our approach. Altogether, this framework enables truly adaptive and scalable LLMs that seamlessly integrate specialized knowledge for improved zero-shot performance.

Subject: EMNLP.2025 - Main

2025.emnlp-main.1533@ACL

#1 Split-Merge: Scalable and Memory-Efficient Merging of Expert LLMs [PDF] [Copy] [Kimi] [REL]