LayoutParser: A Unified Toolkit for Deep Learning Based Document Image Analysis

Zejiang Shen; Ruochen Zhang; Melissa Dell; Benjamin Charles Germain Lee; Jacob Carlson; Weining Li

LayoutParser：ディープラーニングベースのドキュメント画像分析のための統合ツールキット

ドキュメント画像分析（DIA）の最近の進歩は、主にニューラルネットワークのアプリケーションによって推進されてきました。理想的には、研究成果を本番環境に簡単に展開し、さらに調査するために拡張することができます。ただし、大まかに編成されたコードベースや洗練されたモデル構成などのさまざまな要因により、幅広い対象者が重要なイノベーションを簡単に再利用することが複雑になっています。自然言語処理やコンピュータービジョンなどの分野で、再利用性を改善し、ディープラーニング（DL）モデルの開発を簡素化するための継続的な取り組みが行われていますが、DIAの領域での課題に最適化されているものはありません。 DIAは社会科学と人文科学の幅広い分野にわたる学術研究の中心であるため、これは既存のツールキットの大きなギャップを表しています。このホワイトペーパーでは、DIAの研究とアプリケーションでDLの使用を合理化するためのオープンソースライブラリであるlayoutparserを紹介します。コアlayoutparserライブラリには、レイアウト検出、文字認識、およびその他の多くのドキュメント処理タスク用にDLモデルを適用およびカスタマイズするためのシンプルで直感的なインターフェイスのセットが付属しています。拡張性を促進するために、layoutparserには、事前にトレーニングされたモデルと完全なドキュメントデジタル化パイプラインの両方を共有するためのコミュニティプラットフォームも組み込まれています。 layoutparserは、実際のユースケースで軽量および大規模なデジタル化パイプラインの両方に役立つことを示しています。ライブラリはhttps://layout-parser.github.io/で公開されています。

Recent advances in document image analysis (DIA) have been primarily driven by the application of neural networks. Ideally, research outcomes could be easily deployed in production and extended for further investigation. However, various factors like loosely organized codebases and sophisticated model configurations complicate the easy reuse of important innovations by a wide audience. Though there have been on-going efforts to improve reusability and simplify deep learning (DL) model development in disciplines like natural language processing and computer vision, none of them are optimized for challenges in the domain of DIA. This represents a major gap in the existing toolkit, as DIA is central to academic research across a wide range of disciplines in the social sciences and humanities. This paper introduces layoutparser, an open-source library for streamlining the usage of DL in DIA research and applications. The core layoutparser library comes with a set of simple and intuitive interfaces for applying and customizing DL models for layout detection, character recognition, and many other document processing tasks. To promote extensibility, layoutparser also incorporates a community platform for sharing both pre-trained models and full document digitization pipelines. We demonstrate that layoutparser is helpful for both lightweight and large-scale digitization pipelines in real-word use cases. The library is publicly available at https://layout-parser.github.io/.

updated: Mon Jun 21 2021 16:24:36 GMT+0000 (UTC)

published: Mon Mar 29 2021 05:55:08 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト