Multi-Type-TD-TSR -- Extracting Tables from Document Images using a Multi-stage Pipeline for Table Detection and Table Structure Recognition: from OCR to Structured Table Representations

Pascal Fischer; Alen Smajic; Alexander Mehler; Giuseppe Abrami

Multi-Type-TD-TSR-テーブル検出とテーブル構造認識のためのマルチステージパイプラインを使用したドキュメント画像からのテーブルの抽出：OCRから構造化テーブル表現まで

世界的なトレンドがデータ駆動型産業にシフトするにつれて、スキャンされたドキュメントのデジタル画像を機械可読情報に変換できる自動化されたアルゴリズムの需要が急速に高まっています。データ分析ツールを適用するためのデータデジタル化の機会に加えて、以前はドキュメントを手動で検査する必要があったプロセスの自動化に向けた大幅な改善もあります。光学式文字認識技術の導入により、人間が読み取れる文字を画像から機械が読み取れる文字に変換するタスクはほとんど解決されましたが、テーブルのセマンティクスを抽出するタスクは、長年にわたってあまり焦点が当てられていませんでした。テーブルの認識は、テーブルの検出とテーブル構造の認識という2つの主要なタスクで構成されます。この問題に関するこれまでのほとんどの作業は、エンドツーエンドのソリューションを提供せずに、またはドキュメント画像内の回転画像やノイズアーティファクトなどの実際のアプリケーション条件に注意を払わずに、いずれかのタスクに焦点を当てています。最近の研究では、十分に大きなデータセットがないため、テーブル構造認識のタスクに転送学習を使用することと相まって、ディープラーニングアプローチへの明確な傾向が示されています。このホワイトペーパーでは、Multi-Type-TD-TSRという名前の多段パイプラインを紹介します。これは、テーブル認識の問題に対するエンドツーエンドのソリューションを提供します。最先端の深層学習モデルをテーブル検出に利用し、テーブルの境界に基づいて3つの異なるタイプのテーブルを区別します。テーブル構造の認識には、すべてのテーブルタイプで機能する決定論的非データ駆動型アルゴリズムを使用します。さらに、2つのアルゴリズムを示します。 1つは境界のないテーブル用で、もう1つは境界のあるテーブル用です。これらは、使用されるテーブル構造認識アルゴリズムのベースです。 ICDAR 2019テーブル構造認識データセットでMulti-Type-TD-TSRを評価し、新しい最先端を実現します。

As global trends are shifting towards data-driven industries, the demand for automated algorithms that can convert digital images of scanned documents into machine readable information is rapidly growing. Besides the opportunity of data digitization for the application of data analytic tools, there is also a massive improvement towards automation of processes, which previously would require manual inspection of the documents. Although the introduction of optical character recognition technologies mostly solved the task of converting human-readable characters from images into machine-readable characters, the task of extracting table semantics has been less focused on over the years. The recognition of tables consists of two main tasks, namely table detection and table structure recognition. Most prior work on this problem focuses on either task without offering an end-to-end solution or paying attention to real application conditions like rotated images or noise artefacts inside the document image. Recent work shows a clear trend towards deep learning approaches coupled with the use of transfer learning for the task of table structure recognition due to the lack of sufficiently large datasets. In this paper we present a multistage pipeline named Multi-Type-TD-TSR, which offers an end-to-end solution for the problem of table recognition. It utilizes state-of-the-art deep learning models for table detection and differentiates between 3 different types of tables based on the tables' borders. For the table structure recognition we use a deterministic non-data driven algorithm, which works on all table types. We additionally present two algorithms. One for unbordered tables and one for bordered tables, which are the base of the used table structure recognition algorithm. We evaluate Multi-Type-TD-TSR on the ICDAR 2019 table structure recognition dataset and achieve a new state-of-the-art.

updated: Sun May 23 2021 21:17:18 GMT+0000 (UTC)

published: Sun May 23 2021 21:17:18 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト