Split, embed and merge: An accurate table structure recognizer

Zhenrong Zhang; Jianshu Zhang; Jun Du

分割、埋め込み、マージ：正確なテーブル構造認識機能

テーブル構造の認識は、マシンにテーブルを理解させるために不可欠な部分です。その主なタスクは、テーブルの内部構造を認識することです。ただし、構造とスタイルの複雑さと多様性のために、特に複雑なテーブルの場合、表形式のデータをマシンが簡単に理解できる構造化された形式に解析することは非常に困難です。このホワイトペーパーでは、正確なテーブル構造認識機能であるSplit、Embed and Merge（SEM）を紹介します。私たちのモデルはテーブル画像を入力として受け取り、テーブルが単純なテーブルであろうと複雑なテーブルであろうと、テーブルの構造を正しく認識できます。 SEMは主に、スプリッター、エンベッダー、マージャーの3つの部分で構成されています。最初の段階では、スプリッターを適用して、テーブルの行（列）セパレーターの潜在的な領域を予測し、テーブルの細かいグリッド構造を取得します。第2段階では、テーブル内のテキスト情報を十分に考慮して、ビジョンと言語の両方のモダリティからの各テーブルグリッドの出力機能を融合します。さらに、セマンティック機能を追加することで、実験の精度を高めています。最後に、これらの基本的なテーブルグリッドのマージを自己回帰方式で処理します。対応するマージ結果は、注意メカニズムを通じて学習されます。私たちの実験では、SEMはSciTSRデータセットで平均F1-メジャー97.11％を達成し、他の方法を大幅に上回っています。また、ICDAR 2021科学文献解析コンペティション、タスクBで、複雑なテーブルで1位、すべてのテーブルで3位を獲得しました。他の公開されているデータセットでの広範な実験は、私たちのモデルが最先端を達成していることを示しています。

Table structure recognition is an essential part for making machines understand tables. Its main task is to recognize the internal structure of a table. However, due to the complexity and diversity in their structure and style, it is very difficult to parse the tabular data into the structured format which machines can understand easily, especially for complex tables. In this paper, we introduce Split, Embed and Merge (SEM), an accurate table structure recognizer. Our model takes table images as input and can correctly recognize the structure of tables, whether they are simple or a complex tables. SEM is mainly composed of three parts, splitter, embedder and merger. In the first stage, we apply the splitter to predict the potential regions of the table row (column) separators, and obtain the fine grid structure of the table. In the second stage, by taking a full consideration of the textual information in the table, we fuse the output features for each table grid from both vision and language modalities. Moreover, we achieve a higher precision in our experiments through adding additional semantic features. Finally, we process the merging of these basic table grids in a self-regression manner. The correspondent merging results is learned through the attention mechanism. In our experiments, SEM achieves an average F1-Measure of 97.11% on the SciTSR dataset which outperforms other methods by a large margin. We also won the first place in the complex table and third place in all tables in ICDAR 2021 Competition on Scientific Literature Parsing, Task-B. Extensive experiments on other publicly available datasets demonstrate that our model achieves state-of-the-art.

updated: Tue Jul 20 2021 13:18:55 GMT+0000 (UTC)

published: Mon Jul 12 2021 06:26:19 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト