#181
summarized by : Anonymous
Not All Tokens Are Equal: Human-Centric Visual Analysis via Token Clustering Transformer

どんな論文か?

improving human-centric vision tasks by Non-uniform token distribution (generating more tokens to import regions, fewer tokens to background regions)
placeholder

新規性

merging tokens by progressive token clustering (not-adjacent token merging, non-rectangle shapes); proposed Multi-stage Token Aggregation (MTA) in FPN-like way (upsampling merged tokens, etc)

結果

SOTA results on whole-body pose estimation on COCO-WholeBody and 3D human mesh reconstruction on 3DPW

その他(なぜ通ったか?等)