Efficient Document Image Classification Using Region-Based Graph Neural Network

Jaya Krishna Mandivarapu; Eric Bunch; Qian You; Glenn Fung

領域ベースのグラフニューラルネットワークを使用した効率的なドキュメント画像分類

ドキュメント画像の分類は、さまざまな業界の多くのエンタープライズアプリケーションで商業化できるため、依然として人気のある研究分野です。事前にトレーニングされた大規模なコンピュータービジョンと言語モデル、およびグラフニューラルネットワークの最近の進歩により、ドキュメント画像の分類に多くのツールが提供されています。ただし、事前にトレーニングされた大規模なモデルを使用するには、通常、大量のコンピューティングリソースが必要であり、自動ドキュメント画像分類のコスト削減の利点が損なわれる可能性があります。この論文では、グラフ畳み込みニューラルネットワークを使用し、ドキュメントのテキスト、ビジュアル、およびレイアウト情報を組み込んだ効率的なドキュメント画像分類フレームワークを提案します。公開されているデータセットと生命保険のドキュメント分類データセットの両方で、提案されたアルゴリズムをいくつかの最先端のビジョンと言語モデルに対して厳密にベンチマークしました。公開されているデータと実際のデータの両方に関する経験的結果は、私たちの方法がほぼSOTAのパフォーマンスを達成しているにもかかわらず、モデルのトレーニングと推論に必要なコンピューティングリソースと時間がはるかに少ないことを示しています。これにより、特にエンタープライズアプリケーションのスケーラブルな展開において、より優れたコスト上の利点を提供するよりもソリューションが得られます。結果は、私たちのアルゴリズムがSOTAに非常に近い分類パフォーマンスを達成できることを示しました。また、提案された方法とベースラインの間で、コンピューティングリソース、モデルサイズ、トレーニング、および推論時間の包括的な比較を提供します。さらに、私たちの方法と他のベースラインを使用して、画像あたりのコストを示します。

Document image classification remains a popular research area because it can be commercialized in many enterprise applications across different industries. Recent advancements in large pre-trained computer vision and language models and graph neural networks has lent document image classification many tools. However using large pre-trained models usually requires substantial computing resources which could defeat the cost-saving advantages of automatic document image classification. In the paper we propose an efficient document image classification framework that uses graph convolution neural networks and incorporates textual, visual and layout information of the document. We have rigorously benchmarked our proposed algorithm against several state-of-art vision and language models on both publicly available dataset and a real-life insurance document classification dataset. Empirical results on both publicly available and real-world data show that our methods achieve near SOTA performance yet require much less computing resources and time for model training and inference. This results in solutions than offer better cost advantages, especially in scalable deployment for enterprise applications. The results showed that our algorithm can achieve classification performance quite close to SOTA. We also provide comprehensive comparisons of computing resources, model sizes, train and inference time between our proposed methods and baselines. In addition we delineate the cost per image using our method and other baselines.

updated: Fri Jun 25 2021 17:57:04 GMT+0000 (UTC)

published: Fri Jun 25 2021 17:57:04 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト