#143
summarized by : Anonymous
A ConvNet for the 2020s

どんな論文か?

identifying the confounding variables for conv and ViT, how does ViT design affect conv; how to modernize conv to close the gap between pre-ViT conv and post-Vit conv.
placeholder

新規性

improving conv by enhanced recipe, patchify stem, stage ratio, depthwise conv, inverting dimension; increase depth; move up depthwise conv; fewer activation & norms, ReLU -> GeLU, large kernel size

結果

The proposed ConvNeXt models compete favorably with SOTA hierarchical vision transformer across multiple computer vision benchmarks.

その他(なぜ通ったか?等)