#160
summarized by : Anonymous
MetaFormer Is Actually What You Need for Vision

どんな論文か?

What contributes to the good performance of Transformers? the general architecture of the transformers, instead of the specific token mixer module
placeholder

新規性

proposed a general architecture 'MetaFormer' abstracted from Transformer with specifying the token mixer; showed that with a naive token-mixer (PoolFormer), promising results can be obtained.

結果

replacing attention module in Transformer with spatial pooling operation as a token mixer (named PoolFormer), competitive performance on classification, detection, segmentation tasks obtained

その他(なぜ通ったか?等)