Aligning benchmark datasets for table structure recognition

Brandon Smock; Rohith Pesala; Robin Abraham

テーブル構造認識のためのベンチマークデータセットの調整

テーブル構造認識 (TSR) のベンチマークデータセットは、一貫してアノテーションが付けられるように慎重に処理する必要があります。ただし、データセットのアノテーションに自己一貫性がある場合でも、データセット間に重大な不一致が存在する可能性があり、データセット上でトレーニングおよび評価されたモデルのパフォーマンスに悪影響を与える可能性があります。この研究では、これらのベンチマークを調整し、ベンチマーク間のエラーと不一致の両方を除去することで、モデルのパフォーマンスが大幅に向上することを示します。私たちは、データ中心のアプローチを通じてこれを実証します。このアプローチでは、一貫して固定されている 1 つのモデルアーキテクチャであるテーブルトランスフォーマー (TATR) を採用します。 ICDAR-2013 ベンチマークで評価された TATR のベースライン完全一致精度は、PubTables-1M でトレーニングされた場合は 65%、FinTabNet でトレーニングされた場合は 42%、合計で 69% です。アノテーションの間違いとデータセット間の不一致を削減した後、ICDAR-2013 で評価された TATR のパフォーマンスは、PubTables-1M でトレーニングした場合は 75%、FinTabNet でトレーニングした場合は 65%、合計で 81% に大幅に増加しました。変更ステップのアブレーションを通じて、テーブルアノテーションの正規化がパフォーマンスに大きなプラスの効果をもたらす一方で、他の選択肢はベンチマークデータセットの最終構成を決定する際に生じる必要なトレードオフのバランスをとっていることを示します。全体として、私たちの仕事は TSR のベンチマーク設計や、場合によっては他のタスクにも重要な影響を与えると考えています。データセットの処理とトレーニングコードは https://github.com/microsoft/table-transformer でリリースされます。

Benchmark datasets for table structure recognition (TSR) must be carefully processed to ensure they are annotated consistently. However, even if a dataset's annotations are self-consistent, there may be significant inconsistency across datasets, which can harm the performance of models trained and evaluated on them. In this work, we show that aligning these benchmarksx2014removing both errors and inconsistency between themx2014improves model performance significantly. We demonstrate this through a data-centric approach where we adopt one model architecture, the Table Transformer (TATR), that we hold fixed throughout. Baseline exact match accuracy for TATR evaluated on the ICDAR-2013 benchmark is 65% when trained on PubTables-1M, 42% when trained on FinTabNet, and 69% combined. After reducing annotation mistakes and inter-dataset inconsistency, performance of TATR evaluated on ICDAR-2013 increases substantially to 75% when trained on PubTables-1M, 65% when trained on FinTabNet, and 81% combined. We show through ablations over the modification steps that canonicalization of the table annotations has a significantly positive effect on performance, while other choices balance necessary trade-offs that arise when deciding a benchmark dataset's final composition. Overall we believe our work has significant implications for benchmark design for TSR and potentially other tasks as well. Dataset processing and training code will be released at https://github.com/microsoft/table-transformer.

updated: Tue May 23 2023 18:57:24 GMT+0000 (UTC)

published: Wed Mar 01 2023 18:20:24 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト