#138
summarized by : Anonymous
AdaViT: Adaptive Vision Transformers for Efficient Image Recognition

どんな論文か?

improving inference speed for recognition task by reducing the number of patches, self-attention heads and Transformer blocks to use based on the input image;
placeholder

新規性

use a small decision network (binary classification task) inside the Transformer block to decide how to reduce the number of patches, self-attention heads and Transformer blocks;

結果

obtained more than 2× improvement in efficiency compared to state-of-the-art vision transformers with only 0.8% drop of accuracy on ImageNet

その他(なぜ通ったか?等)