Dynamic Fusion With Intra- and Inter-Modality Attention Flow for Visual Question Answering

#859

summarized by : Keito Ishihara

Peng Gao, Zhengkai Jiang, Haoxuan You, Pan Lu, Steven C. H. Hoi, Xiaogang Wang, Hongsheng Li

どんな論文か？

VQAにおける画像と言語のマルチモーダル特徴の融合のためのattention-flowを提案。

新規性

モーダル内でのself-attentionのようなInter-modality Attention Flowと、モーダル間のsource-target attentionのような Dynamic IntraModality Attention Flow moduleを提案。これらの組み合わせによりテキストと画像の特徴の融合を実現した。

結果

VQA 2.0 datasetで実験しSOTA。モジュールのAblation studyも行った。

その他（なぜ通ったか？等）

このページで利用されている画像は論文から引用しています．