#51
summarized by : kaikai zhao
Switchable Online Knowledge Distillation

どんな論文か?

Proposed a solution for the issue of online distillation: student cannot benefit from mimicking teacher if the distillation gap between them is large;
placeholder

新規性

divide the learning of the teacher to expert and learning mode (freeze); distillation gap is quantified by l1 norm of the gradient; defined an adaptive threshold to switch between the two modes

結果

SOTA results compared with other online distillation methods on cifar100, tiny-imagenet, imagenet;

その他(なぜ通ったか?等)

cnn models, only logits distillation (no feature, attention distillation)