- …
- …
#143
summarized by : Anonymous
どんな論文か?
identifying the confounding variables for conv and ViT, how does ViT design affect conv; how to modernize conv to close the gap between pre-ViT conv and post-Vit conv.
新規性
improving conv by enhanced recipe, patchify stem, stage ratio, depthwise conv, inverting dimension; increase depth; move up depthwise conv; fewer activation & norms, ReLU -> GeLU, large kernel size
結果
The proposed ConvNeXt models compete favorably with SOTA hierarchical vision transformer across multiple computer vision benchmarks.
その他(なぜ通ったか?等)
- …
- …