2025.naacl-tutorial.4@ACL

Total: 1

#1 Knowledge Distillation for Language Models [PDF2] [Copy] [Kimi3] [REL]

Authors: Yuqiao Wen, Freda Shi, Lili Mou

Knowledge distillation (KD) aims to transfer the knowledge of a teacher (usually a large model) to a student (usually a small one). In this tutorial, our goal is to provide participants with a comprehensive understanding of the techniques and applications of KD for language models. After introducing the basic concepts including intermediate-layer matching and prediction matching, we will present advanced techniques such as reinforcement learning-based KD and multi-teacher distillation. For applications, we will focus on KD for large language models (LLMs), covering topics ranging from LLM sequence compression to LLM self-distillation. The target audience is expected to know the basics of machine learning and NLP, but do not have to be familiar with the details of math derivation and neural models

Subject: NAACL.2025 - Tutorial Abstracts