Personalized Education: Blind Knowledge Distillation

#86

summarized by : Anonymous

Xiang Deng; Jian Zheng; Zhongfei Zhang

どんな論文か？

identified that the performance gap between teacher and student is not due to model capacity but the distillation data; proposed a novel data augmentation tech (MixPatch) Knowledge distillation tasks;

新規性

MixPatch: patch-wise linear combination of two images; using predictions of teacher, student to guide the coefficient of the combination, patch size selection to generate difficult samples for student

結果

substantially reduces the performance gap between students and teachers, even enables small students to outperform large teachers

その他（なぜ通ったか？等）

このページで利用されている画像は論文から引用しています．