PDFVQA: A New Dataset for Real-World VQA on PDF Documents

Yihao Ding; Siwen Luo; Hyunsuk Chung; Soyeon Caren Han

PDFVQA: PDF ドキュメントの実際の VQA のための新しいデータセット

ドキュメントベースの視覚的質問応答は、自然言語の質問の条件でドキュメント画像のドキュメント理解を調べます。文書要素認識、文書レイアウト構造理解、文脈理解、重要情報抽出など、さまざまな側面から文書理解を包括的に調べるために、新しい文書ベースの VQA データセット PDF-VQA を提案しました。当社の PDF-VQA データセットは、単一のドキュメントページに制限されているドキュメント理解の現在のスケールを、複数ページのドキュメント全体にわたって質問する新しいスケールに拡張します。また、ドキュメント構造の理解を促進するために、異なるドキュメント要素間の空間的および階層的構造関係を明示的に統合する新しいグラフベースの VQA モデルを提案します。パフォーマンスは、さまざまな質問の種類とタスクに関するいくつかのベースラインと比較されます\footnote{完全なデータセットは、論文の受理後にリリースされます.

Document-based Visual Question Answering examines the document understanding of document images in conditions of natural language questions. We proposed a new document-based VQA dataset, PDF-VQA, to comprehensively examine the document understanding from various aspects, including document element recognition, document layout structural understanding as well as contextual understanding and key information extraction. Our PDF-VQA dataset extends the current scale of document understanding that limits on the single document page to the new scale that asks questions over the full document of multiple pages. We also propose a new graph-based VQA model that explicitly integrates the spatial and hierarchically structural relationships between different document elements to boost the document structural understanding. The performances are compared with several baselines over different question types and tasks\footnote{The full dataset will be released after paper acceptance.

updated: Wed Apr 19 2023 14:10:08 GMT+0000 (UTC)

published: Thu Apr 13 2023 12:28:14 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト