- …
- …
#51
summarized by : kaikai zhao
どんな論文か?
Proposed a solution for the issue of online distillation: student cannot benefit from mimicking teacher if the distillation gap between them is large;
新規性
divide the learning of the teacher to expert and learning mode (freeze);
distillation gap is quantified by l1 norm of the gradient;
defined an adaptive threshold to switch between the two modes
結果
SOTA results compared with other online distillation methods on cifar100, tiny-imagenet, imagenet;
その他(なぜ通ったか?等)
cnn models, only logits distillation (no feature, attention distillation)
- …
- …