#30
summarized by : Kaikai Zhao
Vision Transformer With Deformable Attention

どんな論文か?

Proposed deformable attention transformer, a general backbone to reduce computation cost of dense attention. It can learn sparse attention in a data-dependent way and model geometric transformations.
placeholder

新規性

1) first deformable self-attention backbone; 2) data-dependent attention (deformed sampling points are learned from queries by an offset network, keys and values are learned from sampled features)

結果

SOTA results on image classification (imagenet-1k), object detection (coco), semantic segmentation (ADE20K).

その他(なぜ通ったか?等)