- …
- …
#370
summarized by : Anonymous
どんな論文か?
a general backbone design; tackling the issues of limited receptive field and weak modeling capability in local-window self-attention;
新規性
1) combine local-window self-attention with depth-wise conv in parallel (two branches) to increase receptive field; 2) bi-directional interaction (channel, spatial attention) between two branches
結果
outperforms its alternatives by significant margins with less computational costs in 5 dense prediction tasks on MS COCO, ADE20k, and LVIS
その他(なぜ通ったか?等)
the proposed method is limited to window-based vision transformers; experiments with global attention show slightly worse results on Imagenet-1k; the interactions between branches are SE attention
- …
- …