InfographicVQA

Minesh Mathew; Viraj Bagal; Rubèn Pérez Tito; Dimosthenis Karatzas; Ernest Valveny; C. V Jawahar

InfographicVQA

インフォグラフィックは、テキスト、グラフィック、および視覚要素の組み合わせを使用して情報を効果的に伝達するように設計されたドキュメントです。この作業では、視覚的な質問応答技術を使用してインフォグラフィック画像の自動理解を探ります。この目的のために、インフォグラフィックの多様なコレクションと自然言語の質問応答注釈で構成される新しいデータセットであるInfographicVQAを紹介します。収集された質問には、ドキュメントレイアウト、テキストコンテンツ、グラフィック要素、およびデータの視覚化について共同で推論する方法が必要です。初等算術と基本的な算術スキルを必要とする質問に重点を置いてデータセットをキュレートします。最後に、最先端のマルチモーダルVQAモデルに基づいて2つの強力なベースラインを評価し、新しいタスクのベースラインパフォーマンスを確立します。データセット、コード、リーダーボードはhttp://docvqa.orgで利用できるようになります

Infographics are documents designed to effectively communicate information using a combination of textual, graphical and visual elements. In this work, we explore the automatic understanding of infographic images by using Visual Question Answering technique.To this end, we present InfographicVQA, a new dataset that comprises a diverse collection of infographics along with natural language questions and answers annotations. The collected questions require methods to jointly reason over the document layout, textual content, graphical elements, and data visualizations. We curate the dataset with emphasis on questions that require elementary reasoning and basic arithmetic skills. Finally, we evaluate two strong baselines based on state of the art multi-modal VQA models, and establish baseline performance for the new task. The dataset, code and leaderboard will be made available at http://docvqa.org

updated: Mon Apr 26 2021 17:45:54 GMT+0000 (UTC)

published: Mon Apr 26 2021 17:45:54 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト