- …
- …
#209
summarized by : Anonymous
どんな論文か?
a new compression framework (MiniViT) for vision transformer with weight (MSA and MLP) sharing
新規性
naive weight sharing: training instability, performance degradation; weight transformation to increase diversity; distillation (prediction-logits, self-attention, hidden states) to improve performance
結果
compressed models (Mini-DeiT, Mini-Swin) achieved competitive or better performance than original models DeiT, Swin; good generalization ability for downstream tasks (classification, detection)
その他(なぜ通ったか?等)
- …
- …