Options
Knowledge Distillation with Ensemble Calibration
Date Issued
2023-12-15
Author(s)
Mishra, Ishan
Jain, Riyanshu
Viradiya, Dhruv
Patel, Divyam
Mishra, Deepak
DOI
10.1145/3627631.3627647
Abstract
Knowledge Distillation is a transfer learning and compression technique that aims to transfer hidden knowledge from a teacher model to a student model. However, this transfer often leads to poor calibration in the student model. This can be problematic for high-risk applications that require well-calibrated models to capture prediction uncertainty. To address this issue, we propose a simple and novel technique that enhances the calibration of the student network by using an ensemble of well-calibrated teacher models. We train multiple teacher models using various data-augmentation techniques such as cutout, mixup, CutMix, and AugMix and use their ensemble for knowledge distillation. We evaluate our approach on different teacher-student combinations using CIFAR-10 and CIFAR-100 datasets. Our results demonstrate that our technique improves calibration metrics (such as expected calibration and overconfidence errors) while also increasing the accuracy of the student network.