#86
summarized by : Anonymous
Personalized Education: Blind Knowledge Distillation

どんな論文か?

identified that the performance gap between teacher and student is not due to model capacity but the distillation data; proposed a novel data augmentation tech (MixPatch) Knowledge distillation tasks;
placeholder

新規性

MixPatch: patch-wise linear combination of two images; using predictions of teacher, student to guide the coefficient of the combination, patch size selection to generate difficult samples for student

結果

substantially reduces the performance gap between students and teachers, even enables small students to outperform large teachers

その他(なぜ通ったか?等)