Shunted Self-Attention via Multi-Scale Token Aggregation

#195

summarized by : Anonymous

Sucheng Ren; Daquan Zhou; Shengfeng He; Jiashi Feng; Xinchao Wang

どんな論文か？

enabling ViTs to model the attentions at hybrid scales per attention layer

新規性

half attention heads learn Key and Values from downsampled features with downsampling ratio r1; another half with r2; learned features aggregated in FFN by depth-wise conv

結果

SOTA on imagenet-1k, COCO compared with other backbones

その他（なぜ通ったか？等）

downsampling implemented by conv with same kernel size but different strides

このページで利用されている画像は論文から引用しています．