PhraseClick: Toward Achieving Flexible Interactive Segmentation by Phrase and Click

#74

summarized by : Yue Qiu

Henghui Ding, Scott Cohen, Brian Price, Xudong Jiang

どんな論文か？

Contrastive LearningをPhrase groundingに導入し，weakly-supervisedでWordと画像Regionの関連を学習可能．入力の画像とCaptionのペアから，non-correspondingのペアと比べ，correspondingのAttention-weighted画像領域とワードのcompatibility(lower bound)を最大化する.

新規性

①Phrase grounding問題をContrastive LearningにFormulateした．画像領域とCaptionのワード間のMutual 情報を推定し、その情報のLower boundを最大化することで，Weakly-Supervisedで画像領域とWordの関係を学習可能；②言語モデルを用いてNegative captionを生成する手法を提案．

結果

①Randomlyサンプリングと比べ提案の生成Negative captionで10％程度の精度向上を得られた；②COCO-Captionsで学習済みのモデルでFlickr30Kデータセットで76.7%の精度を得られた．

その他（なぜ通ったか？等）

①Contrastive learning をPhrase groundingに導入した；②Unsupervised．

このページで利用されている画像は論文から引用しています．