On Vocabulary Reliance in Scene Text Recognition

#615

summarized by : Yue Qiu

Zhaoyi Wan, Jielei Zhang, Liang Zhang, Jiebo Luo, Cong Yao

どんな論文か？

Scene Text Recognitionタスクで良い精度を得たモデルでもOut-of-vocabularyに対して精度が劣り、学習のvocabularyに頼る問題"vocabulary reliance"に対し，従来の異なるContext modeling, prediction及び異なる組合せについて網羅的実験．vocabulary relianceに関して知見を発見し、改善法を提案．

新規性

知見を得られた：①既存手法は多少でもvocabulary reliance問題があり，この問題はubiquitousである；②Attention-based decodersはout-of-vocabularyに対し弱い、それと比べsegmentation-based は画像特徴を有効的利用できる；③精度を得るためにcontext modelingとpredictionの組合せの選択が重要．

結果

attention-basedとsegmentation-basedなDecoderをcollaborativelymutual学習できるフレームワークを提案した．提案フレームワークによりvocabulary relianceを緩和しながら，scene text recognitionの精度も向上できる．

その他（なぜ通ったか？等）

Scene Text Recognitionに重要かつ従来重要視されていない vocabulary reliance問題を発見し、網羅的な調査を行った．この分野の今後の研究に良い知見をもたらした．

このページで利用されている画像は論文から引用しています．