Total: 1
Adversarial robustness generally relies on large-scale architectures and datasets alongside extensive adversary generation, hindering resource-efficient deployment. For scalable solutions, adversarially robust knowledge distillation has emerged as a principle strategy, facilitating the transfer of adversarial robustness from a large-scale teacher model to a lightweight student model. However, existing works focus solely on sample-to-sample alignment of features or predictions between the teacher and student models, overlooking the vital role of their statistical alignment. Thus, we propose a novel adversarially robust knowledge distillation method that integrates the alignment of feature distributions between the teacher and student backbones under adversarial and clean sample sets. To motivate our idea, for an adversarially trained model (e.g., student or teacher), we show that the adversarially robust accuracy (evaluated on testing adversarial samples under an increasing perturbation radius) correlates negatively with the gap between the feature variance evaluated on testing adversarial samples and testing clean samples. Such a negative correlation exhibits a strong linear trend, suggesting that aligning the feature covariance of the student model toward the feature covariance of the teacher model should improve the adversarial robustness of the student model by reducing the variance gap. A similar trend is observed by reducing the variance gap between the gram matrices of the student and teacher models. Extensive evaluations highlight the state-of-the-art adversarial robustness and natural performance of our method across diverse datasets and distillation scenarios.