SelfDocSeg: A Self-Supervised vision-based Approach towards Document Segmentation

Subhajit Maity; Sanket Biswas; Siladittya Manna; Ayan Banerjee; Josep Lladós; Saumik Bhattacharya; Umapada Pal

SelfDocSeg: ドキュメントセグメンテーションに向けた自己管理型ビジョンベースのアプローチ

ドキュメントレイアウト分析は、ドキュメント研究コミュニティにとって既知の問題であり、テキストマイニング、認識からグラフベースの表現、視覚的特徴抽出などに至るまで、多数のソリューションを生み出すために広く調査されてきました。しかし、既存の研究のほとんどは無視されてきました。ラベル付きデータの希少性に関する決定的な事実。個人の生活へのインターネット接続が拡大するにつれて、膨大な量のドキュメントがパブリックドメインで利用できるようになり、データの注釈付けが退屈な作業になっていました。私たちは自己教師を使用してこの課題に対処し、テキストマイニングとテキストラベルを使用するいくつかの既存の自己教師付きドキュメントセグメンテーションアプローチとは異なり、グラウンドトゥルースラベルまたはその派生物を使用せずに事前トレーニングで完全なビジョンベースのアプローチを使用します。代わりに、ドキュメント画像から疑似レイアウトを生成して画像エンコーダーを事前トレーニングし、オブジェクト検出モデルで微調整する前に、自己教師ありフレームワークでドキュメントオブジェクトの表現とローカリゼーションを学習します。私たちのパイプラインがこのコンテキストで新しいベンチマークを設定し、既存の方法および監視された対応物と同等のパフォーマンスを発揮することを示します。コードは、https://github.com/MaitySubhajit/SelfDocSeg で公開されています。

Document layout analysis is a known problem to the documents research community and has been vastly explored yielding a multitude of solutions ranging from text mining, and recognition to graph-based representation, visual feature extraction, etc. However, most of the existing works have ignored the crucial fact regarding the scarcity of labeled data. With growing internet connectivity to personal life, an enormous amount of documents had been available in the public domain and thus making data annotation a tedious task. We address this challenge using self-supervision and unlike, the few existing self-supervised document segmentation approaches which use text mining and textual labels, we use a complete vision-based approach in pre-training without any ground-truth label or its derivative. Instead, we generate pseudo-layouts from the document images to pre-train an image encoder to learn the document object representation and localization in a self-supervised framework before fine-tuning it with an object detection model. We show that our pipeline sets a new benchmark in this context and performs at par with the existing methods and the supervised counterparts, if not outperforms. The code is made publicly available at: https://github.com/MaitySubhajit/SelfDocSeg

updated: Mon Aug 21 2023 02:14:41 GMT+0000 (UTC)

published: Mon May 01 2023 12:47:55 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト