Entity-Focused Dense Passage Retrieval for Outside-Knowledge Visual Question Answering

Jialin Wu; Raymond J. Mooney

外部知識の視覚的質問応答のためのエンティティに焦点を当てた密なパッセージ検索

ほとんどの Outside-Knowledge Visual Question Answering (OK-VQA) システムは、最初に視覚的な質問から外部知識を取得し、取得したコンテンツに基づいて回答を予測する 2 段階のフレームワークを採用しています。しかし、得られた知識はしばしば不十分です。検索は一般的すぎることが多く、質問に答えるために必要な特定の知識を網羅していません。また、自然に利用可能な監督 (文章に正しい答えが含まれているかどうか) は弱く、質問の関連性を保証するものではありません。これらの問題に対処するために、トレーニング中により強力な監督を提供し、質問に関連するエンティティを認識して、より具体的な知識を取得するのに役立つエンティティ中心の検索 (EnFore) モデルを提案します。実験では、現在最大の外部知識 VQA データセットである OK-VQA で、EnFoRe モデルが優れた検索パフォーマンスを達成することが示されています。また、取得した知識を最先端の VQA モデルと組み合わせて、OK-VQA で新しい最先端のパフォーマンスを実現します。

Most Outside-Knowledge Visual Question Answering (OK-VQA) systems employ a two-stage framework that first retrieves external knowledge given the visual question and then predicts the answer based on the retrieved content. However, the retrieved knowledge is often inadequate. Retrievals are frequently too general and fail to cover specific knowledge needed to answer the question. Also, the naturally available supervision (whether the passage contains the correct answer) is weak and does not guarantee question relevancy. To address these issues, we propose an Entity-Focused Retrieval (EnFoRe) model that provides stronger supervision during training and recognizes question-relevant entities to help retrieve more specific knowledge. Experiments show that our EnFoRe model achieves superior retrieval performance on OK-VQA, the currently largest outside-knowledge VQA dataset. We also combine the retrieved knowledge with state-of-the-art VQA models, and achieve a new state-of-the-art performance on OK-VQA.

updated: Tue Oct 18 2022 21:39:24 GMT+0000 (UTC)

published: Tue Oct 18 2022 21:39:24 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト