A-ViT: Adaptive Tokens for Efficient Vision Transformer

#140

summarized by : Anonymous

Hongxu Yin; Arash Vahdat; Jose M. Alvarez; Arun Mallya; Jan Kautz; Pavlo Molchanov

どんな論文か？

improving the inference speed for recognition tasks by reducing the redundant patches; input-dependent adaptive inference mechanism for vision transformers;

新規性

using the first value in each token to estimate halting score, ponder loss on accumulated halting scores to encourage early stopping with distributional priors towards a target depth of 9 layers

結果

improves the throughput of DEIT-Tiny by 62% and DEIT-Small by 38%with only 0.3% accuracy drop on ImageNet1K

その他（なぜ通ったか？等）

このページで利用されている画像は論文から引用しています．