Vision Transformer With Deformable Attention

#30

summarized by : Kaikai Zhao

Zhuofan Xia; Xuran Pan; Shiji Song; Li Erran Li; Gao Huang

どんな論文か？

Proposed deformable attention transformer, a general backbone to reduce computation cost of dense attention. It can learn sparse attention in a data-dependent way and model geometric transformations.

新規性

1) first deformable self-attention backbone; 2) data-dependent attention (deformed sampling points are learned from queries by an offset network, keys and values are learned from sampled features)

結果

SOTA results on image classification (imagenet-1k), object detection (coco), semantic segmentation (ADE20K).

その他（なぜ通ったか？等）

このページで利用されている画像は論文から引用しています．