- …
- …
#86
summarized by : Anonymous
どんな論文か?
identified that the performance gap between teacher and student is not due to model capacity but the distillation data; proposed a novel data augmentation tech (MixPatch) Knowledge distillation tasks;
新規性
MixPatch: patch-wise linear combination of two images; using predictions of teacher, student to guide the coefficient of the combination, patch size selection to generate difficult samples for student
結果
substantially reduces the performance gap between students and teachers, even enables small students to outperform large teachers
その他(なぜ通ったか?等)
- …
- …