SEMv2: Table Separation Line Detection Based on Conditional Convolution

Zhenrong Zhang; Pengfei Hu; Jiefeng Ma; Jun Du; Jianshu Zhang; Huihui Zhu; Baocai Yin; Bing Yin; Cong Liu

SEMv2: 条件付き畳み込みに基づく表区切り線検出

機械が表を理解するためには、表構造の認識が不可欠な要素です。その主な目的は、テーブルの内部構造を識別することです。それにもかかわらず、構造とスタイルが複雑で多様であるため、表形式のデータを機械が理解できる構造化された形式に解析することは非常に困難です。この作業では、分割と結合に基づく方法の原則に従い、SEMv2 (SEM: Split, Embed and Merge) と呼ばれる正確なテーブル構造認識機能を提案します。「分割」段階の以前の作業とは異なり、テーブル分離線のインスタンスレベルの識別問題に対処し、条件付き畳み込みに基づくテーブル分離線検出戦略を導入することを目指しています。具体的には、最初にテーブル区切り線インスタンスを検出し、次に各インスタンスのテーブル区切り線マスクを動的に予測するトップダウン方式で「分割」を設計します。テーブル分割線マスクを行方向／列方向に処理することにより、最終的なテーブル分割線形状を正確に得ることができる。 SEMv2 を包括的に評価するために、iFLYTAB と呼ばれるテーブル構造認識のためのより困難なデータセットも提示します。これには、写真、スキャンされたドキュメントなどのさまざまなシナリオで複数のスタイルテーブルが含まれます。公開されているデータセット (SciTSR、PubTabNet など) での広範な実験iFLYTAB) は、提案されたアプローチの有効性を示しています。コードと iFLYTAB データセットは、この論文が受理された時点で公開されます。

Table structure recognition is an indispensable element for enabling machines to comprehend tables. Its primary purpose is to identify the internal structure of a table. Nevertheless, due to the complexity and diversity of their structure and style, it is highly challenging to parse the tabular data into a structured format that machines can comprehend. In this work, we adhere to the principle of the split-and-merge based methods and propose an accurate table structure recognizer, termed SEMv2 (SEM: Split, Embed and Merge). Unlike the previous works in the ``split'' stage, we aim to address the table separation line instance-level discrimination problem and introduce a table separation line detection strategy based on conditional convolution. Specifically, we design the ``split'' in a top-down manner that detects the table separation line instance first and then dynamically predicts the table separation line mask for each instance. The final table separation line shape can be accurately obtained by processing the table separation line mask in a row-wise/column-wise manner. To comprehensively evaluate the SEMv2, we also present a more challenging dataset for table structure recognition, dubbed iFLYTAB, which encompasses multiple style tables in various scenarios such as photos, scanned documents, etc. Extensive experiments on publicly available datasets (e.g. SciTSR, PubTabNet and iFLYTAB) demonstrate the efficacy of our proposed approach. The code and iFLYTAB dataset will be made publicly available upon acceptance of this paper.

updated: Wed Mar 08 2023 05:15:01 GMT+0000 (UTC)

published: Wed Mar 08 2023 05:15:01 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト