Boost Image Captioning with Knowledge Reasoning

Feicheng Huang; Zhixin Li; Haiyang Wei; Canlong Zhang; Huifang Ma

知識推論による画像キャプションの強化

与えられた画像に対して人間のような記述を自動的に生成することは、人工知能の潜在的な研究であり、最近大きな注目を集めています。既存の注意方法のほとんどは、文中の単語と画像内の領域の間のマッピング関係を調査します。このような予測できない一致方法により、不調和な配置が発生し、生成されるキャプションの品質が低下する場合があります。この論文では、より正確で意味のあるキャプションについて推論するように努めています。まず、単語ごとに連続した説明を生成する際の視覚的注意の正確さを向上させるために、単語の注意を提案します。特別な単語の注意は、入力画像のさまざまな領域に焦点を合わせるときに単語の重要性を強調し、視覚的な注意の計算を支援するために内部注釈の知識を最大限に活用します。次に、機械では簡単に表現できない理解できない意図を明らかにするために、ナレッジグラフから抽出した外部知識をエンコーダ-デコーダフレームワークに注入して意味のあるキャプションを容易にする新しい戦略を導入します。最後に、無料で入手できる2つのキャプションベンチマークであるMicrosoftCOCOデータセットとFlickr30kデータセットでモデルを検証します。結果は、私たちのアプローチが最先端のパフォーマンスを達成し、既存のアプローチの多くを上回っていることを示しています。

Automatically generating a human-like description for a given image is a potential research in artificial intelligence, which has attracted a great of attention recently. Most of the existing attention methods explore the mapping relationships between words in sentence and regions in image, such unpredictable matching manner sometimes causes inharmonious alignments that may reduce the quality of generated captions. In this paper, we make our efforts to reason about more accurate and meaningful captions. We first propose word attention to improve the correctness of visual attention when generating sequential descriptions word-by-word. The special word attention emphasizes on word importance when focusing on different regions of the input image, and makes full use of the internal annotation knowledge to assist the calculation of visual attention. Then, in order to reveal those incomprehensible intentions that cannot be expressed straightforwardly by machines, we introduce a new strategy to inject external knowledge extracted from knowledge graph into the encoder-decoder framework to facilitate meaningful captioning. Finally, we validate our model on two freely available captioning benchmarks: Microsoft COCO dataset and Flickr30k dataset. The results demonstrate that our approach achieves state-of-the-art performance and outperforms many of the existing approaches.

updated: Mon Nov 02 2020 12:19:46 GMT+0000 (UTC)

published: Mon Nov 02 2020 12:19:46 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト