Not All Tokens Are Equal: Human-Centric Visual Analysis via Token Clustering Transformer

#181

summarized by : Anonymous

Wang Zeng; Sheng Jin; Wentao Liu; Chen Qian; Ping Luo; Wanli Ouyang; Xiaogang Wang

どんな論文か？

improving human-centric vision tasks by Non-uniform token distribution (generating more tokens to import regions, fewer tokens to background regions)

新規性

merging tokens by progressive token clustering (not-adjacent token merging, non-rectangle shapes); proposed Multi-stage Token Aggregation (MTA) in FPN-like way (upsampling merged tokens, etc)

結果

SOTA results on whole-body pose estimation on COCO-WholeBody and 3D human mesh reconstruction on 3DPW

その他（なぜ通ったか？等）

このページで利用されている画像は論文から引用しています．