WuP4hwvLzo@OpenReview

Total: 1

#1 Right Time to Learn: Promoting Generalization via Bio-inspired Spacing Effect in Knowledge Distillation [PDF] [Copy] [Kimi] [REL]

Authors: Guanglong Sun, Hongwei Yan, Liyuan Wang, Qian Li, Bo Lei, Yi Zhong

Knowledge distillation (KD) is a powerful strategy for training deep neural networks (DNNs). While it was originally proposed to train a more compact “student” model from a large “teacher” model, many recent efforts have focused on adapting it as an effective way to promote generalization of the model itself, such as online KD and self KD. Here, we propose an easy-to-use and compatible strategy named Spaced KD to improve the effectiveness of both online KD and self KD, in which the student model distills knowledge from a teacher model trained with a space interval ahead. This strategy is inspired by a prominent theory named spacing effect in the field of biological learning and memory, positing that appropriate intervals between learning trials can significantly enhance learning performance. We provide an in-depth theoretical and empirical analysis showing that the benefits of the proposed spacing effect in KD stem from seeking a flat minima during stochastic gradient descent (SGD). We perform extensive experiments to demonstrate the effectiveness of our Spaced KD in improving the learning performance of DNNs (e.g., the additional performance gain is up to 2.31% and 3.34% on Tiny-ImageNet over online KD and self KD, respectively). Our codes have been released on github~\url{https://github.com/SunGL001/Spaced-KD}.

Subject: ICML.2025 - Poster