AdaViT: Adaptive Vision Transformers for Efficient Image Recognition

#138

summarized by : Anonymous

Lingchen Meng; Hengduo Li; Bor-Chun Chen; Shiyi Lan; Zuxuan Wu; Yu-Gang Jiang; Ser-Nam Lim

どんな論文か？

improving inference speed for recognition task by reducing the number of patches, self-attention heads and Transformer blocks to use based on the input image;

新規性

use a small decision network (binary classification task) inside the Transformer block to decide how to reduce the number of patches, self-attention heads and Transformer blocks;

結果

obtained more than 2× improvement in efficiency compared to state-of-the-art vision transformers with only 0.8% drop of accuracy on ImageNet

その他（なぜ通ったか？等）

このページで利用されている画像は論文から引用しています．