Multiple Document Datasets Pre-training Improves Text Line Detection With Deep Neural Networks

Mélodie Boillet; Christopher Kermorvant; Thierry Paquet

複数のドキュメントデータセットの事前トレーニングにより、ディープニューラルネットワークによるテキスト行の検出が向上します

この論文では、ドキュメントレイアウト分析タスクのための完全畳み込みネットワークを紹介します。最先端の方法では、自然のシーン画像で事前にトレーニングされたモデルを使用していますが、Doc-UFCNの方法では、履歴ドキュメントからオブジェクトを検出するために、ゼロからトレーニングされたU字型モデルに依存しています。ラインセグメンテーションタスク、より一般的にはレイアウト分析の問題をピクセル単位の分類タスクと見なし、モデルは入力画像のピクセルラベリングを出力します。 Doc-UFCNがさまざまなデータセットで最先端の方法よりも優れていることを示し、自然のシーン画像で事前にトレーニングされた部分が良好な結果に到達する必要がないことも示します。さらに、複数のドキュメントデータセットの事前トレーニングによってパフォーマンスが向上することを示します。さまざまな指標を使用してモデルを評価し、方法を公正かつ完全に比較します。

In this paper, we introduce a fully convolutional network for the document layout analysis task. While state-of-the-art methods are using models pre-trained on natural scene images, our method Doc-UFCN relies on a U-shaped model trained from scratch for detecting objects from historical documents. We consider the line segmentation task and more generally the layout analysis problem as a pixel-wise classification task then our model outputs a pixel-labeling of the input images. We show that Doc-UFCN outperforms state-of-the-art methods on various datasets and also demonstrate that the pre-trained parts on natural scene images are not required to reach good results. In addition, we show that pre-training on multiple document datasets can improve the performances. We evaluate the models using various metrics to have a fair and complete comparison between the methods.

updated: Mon Mar 29 2021 11:36:02 GMT+0000 (UTC)

published: Mon Dec 28 2020 09:48:33 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト