- …
- …
#80
summarized by : Anonymous
どんな論文か?
Can we train Transformer backbone from scratch? with model architecture change + more epochs + gradient calibration, training from scratch can achieve similar performances with ImageNet pre-training
新規性
identified that model architecture change from T-T-T-T to C-C-T-T (T and C stand for transformer and convolution block) is important for training Transformer backbone from scratch
結果
training from scratch achieves competitive or better performance (COCO, Swin Transformer as backbone + Faster R-CNN and FCOS, etc )
その他(なぜ通ったか?等)
replacing early self-attention layers with cnn to introduce inductive bias and mitigate the dependence on large-scale pre-training (motivation of using C-C-T-T)
- …
- …