Visuo-Linguistic Question Answering (VLQA) Challenge

Shailaja Keyur Sampat; Yezhou Yang; Chitta Baral

Visuo-言語質問応答（VLQA）チャレンジ

画像とテキストを一緒に理解することは、認知と高度な人工知能（AI）システムの構築の重要な側面です。コミュニティとして、私たちは言語とビジョンのドメインで別々に優れたベンチマークを達成しましたが、共同推論は、最先端のコンピュータービジョンと自然言語処理（NLP）システムにとって依然として課題です。与えられた画像テキストモダリティについての共同推論を導き出し、質問応答設定でVisuo-Linguistic Question Answering（VLQA）チャレンジコーパスをコンパイルするための新しいタスクを提案します。各データセットアイテムは、画像と読み物で構成されています。質問は、視覚情報とテキスト情報の両方を組み合わせるように設計されています。つまり、どちらのモダリティも無視すると、質問に答えられなくなります。まず、VLQAサブセットを解決するために、既存の最良のビジョン言語アーキテクチャを調査し、それらが適切に推論できないことを示します。次に、ベースラインパフォーマンスがわずかに優れたモジュラーメソッドを開発しますが、それでも人間のパフォーマンスよりはるかに遅れています。 VLQAは、視覚言語の文脈で推論するための優れたベンチマークになると信じています。データセット、コード、リーダーボードはhttps://shailaja183.github.io/vlqa/で入手できます。

Understanding images and text together is an important aspect of cognition and building advanced Artificial Intelligence (AI) systems. As a community, we have achieved good benchmarks over language and vision domains separately, however joint reasoning is still a challenge for state-of-the-art computer vision and natural language processing (NLP) systems. We propose a novel task to derive joint inference about a given image-text modality and compile the Visuo-Linguistic Question Answering (VLQA) challenge corpus in a question answering setting. Each dataset item consists of an image and a reading passage, where questions are designed to combine both visual and textual information i.e., ignoring either modality would make the question unanswerable. We first explore the best existing vision-language architectures to solve VLQA subsets and show that they are unable to reason well. We then develop a modular method with slightly better baseline performance, but it is still far behind human performance. We believe that VLQA will be a good benchmark for reasoning over a visuo-linguistic context. The dataset, code and leaderboard is available at https://shailaja183.github.io/vlqa/.

updated: Wed Nov 18 2020 07:45:20 GMT+0000 (UTC)

published: Fri May 01 2020 12:18:55 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト