Unifying Vision, Text, and Layout for Universal Document Processing

Zineng Tang; Ziyi Yang; Guoxin Wang; Yuwei Fang; Yang Liu; Chenguang Zhu; Michael Zeng; Cha Zhang; Mohit Bansal

ユニバーサルドキュメント処理のためのビジョン、テキスト、およびレイアウトの統一

ユニバーサルドキュメントプロセッシング (UDOP) を提案します。UDOP は、ドキュメントの理解と生成を含むさまざまなタスク形式と共に、テキスト、画像、およびレイアウトモダリティを統合するドキュメント AI モデルの基盤です。 UDOP は、テキストコンテンツとドキュメントイメージの間の空間的相関関係を利用して、イメージ、テキスト、およびレイアウトモダリティを 1 つの統一された表現でモデル化します。新しい Vision-Text-Layout Transformer を使用して、UDOP は事前トレーニングとマルチドメインダウンストリームタスクをプロンプトベースのシーケンス生成スキームに統合します。 UDOP は、革新的な自己教師付き目標と多様なラベル付きデータを使用して、大規模なラベルなしドキュメントコーパスの両方で事前トレーニングされています。 UDOP は、マスクされた画像の再構成を介して、テキストおよびレイアウトモダリティからドキュメント画像を生成することも学習します。私たちの知る限りでは、ドキュメント AI の分野で、1 つのモデルが高品質のニューラルドキュメント編集とコンテンツのカスタマイズを同時に実現するのはこれが初めてです。私たちの方法は、財務レポート、学術論文、Web サイトなどの多様なデータドメインにわたって、ドキュメントの理解や QA などの 8 つの Document AI タスクに最先端の技術を設定します。 UDOP は、Document Understanding Benchmark のリーダーボードで第 1 位にランクされています。

We propose Universal Document Processing (UDOP), a foundation Document AI model which unifies text, image, and layout modalities together with varied task formats, including document understanding and generation. UDOP leverages the spatial correlation between textual content and document image to model image, text, and layout modalities with one uniform representation. With a novel Vision-Text-Layout Transformer, UDOP unifies pretraining and multi-domain downstream tasks into a prompt-based sequence generation scheme. UDOP is pretrained on both large-scale unlabeled document corpora using innovative self-supervised objectives and diverse labeled data. UDOP also learns to generate document images from text and layout modalities via masked image reconstruction. To the best of our knowledge, this is the first time in the field of document AI that one model simultaneously achieves high-quality neural document editing and content customization. Our method sets the state-of-the-art on 8 Document AI tasks, e.g., document understanding and QA, across diverse data domains like finance reports, academic papers, and websites. UDOP ranks first on the leaderboard of the Document Understanding Benchmark.

updated: Mon Mar 13 2023 17:42:44 GMT+0000 (UTC)

published: Mon Dec 05 2022 22:14:49 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト