#140
summarized by : Anonymous
A-ViT: Adaptive Tokens for Efficient Vision Transformer

どんな論文か?

improving the inference speed for recognition tasks by reducing the redundant patches; input-dependent adaptive inference mechanism for vision transformers;
placeholder

新規性

using the first value in each token to estimate halting score, ponder loss on accumulated halting scores to encourage early stopping with distributional priors towards a target depth of 9 layers

結果

improves the throughput of DEIT-Tiny by 62% and DEIT-Small by 38%with only 0.3% accuracy drop on ImageNet1K

その他(なぜ通ったか?等)