- …
- …
#195
summarized by : Anonymous
新規性
half attention heads learn Key and Values from downsampled features with downsampling ratio r1; another half with r2; learned features aggregated in FFN by depth-wise conv
結果
SOTA on imagenet-1k, COCO compared with other backbones
その他(なぜ通ったか?等)
downsampling implemented by conv with same kernel size but different strides
- …
- …