Masked Generative Distillation

#73

summarized by : kaikai zhao

Zhendong Yang; Zhe Li; Mingqi Shao; Dachuan Shi; Zehuan Yuan; Chun Yuan

どんな論文か？

proposed a general feature-based distillation method effective on various tasks (classification, detection, segmentation), masked feature generation is better than directly mimicking the features;

新規性

mask random pixels of the student's feature and force it to generate the teacher's full feature by two 3x3 conv layers

結果

improvements observed consistently on variance tasks

その他（なぜ通ったか？等）

for cnn models; MSE loss used for feature distillation

このページで利用されている画像は論文から引用しています．