A Survey of Historical Document Image Datasets

Konstantina Nikolaidou; Mathias Seuret; Hamam Mokayed; Marcus Liwicki

歴史的文書画像データセットの調査

この論文は、手書きの原稿や初期の版画などの歴史的文書に焦点を当てて、文書画像分析のための画像データセットの系統的文献レビューを提示します。歴史的文書分析に適切なデータセットを見つけることは、さまざまな機械学習アルゴリズムを使用した研究を促進するための重要な前提条件です。ただし、実際のデータ（スクリプト、タスク、日付、サポートシステム、劣化の量など）は非常に多様であるため、データとラベルの表現の形式が異なり、評価プロセスとベンチマークも異なるため、適切なデータセットを見つけることができます。難しい作業です。この作業はこのギャップを埋め、既存のデータセットに関するメタ研究を提示します。体系的な選択プロセス（PRISMAガイドラインに準拠）の後、発行年、記事に実装されたメソッドの数、選択したアルゴリズムの信頼性、データセットサイズ、ジャーナルなど、さまざまな要因に基づいて選択された56の研究を選択します。出口。各調査を、ドキュメント分類、レイアウト構造、またはセマンティック分析の3つの事前定義されたタスクのいずれかに割り当てることによって要約します。すべてのデータセットの統計、ドキュメントタイプ、言語、タスク、入力視覚的側面、およびグラウンドトゥルース情報を提示します。さらに、これらの論文または最近のコンテストからのベンチマークタスクと結果を提供します。このドメインのギャップと課題についてさらに説明します。私たちは、一般的な形式（たとえば、コンピュータービジョンタスク用のCOCO形式）への変換ツールを提供し、研究間で結果を比較できるようにするために、1つだけではなく、常に一連の評価指標を提供することを提唱しています。

This paper presents a systematic literature review of image datasets for document image analysis, focusing on historical documents, such as handwritten manuscripts and early prints. Finding appropriate datasets for historical document analysis is a crucial prerequisite to facilitate research using different machine learning algorithms. However, because of the very large variety of the actual data (e.g., scripts, tasks, dates, support systems, and amount of deterioration), the different formats for data and label representation, and the different evaluation processes and benchmarks, finding appropriate datasets is a difficult task. This work fills this gap, presenting a meta-study on existing datasets. After a systematic selection process (according to PRISMA guidelines), we select 56 studies that are chosen based on different factors, such as the year of publication, number of methods implemented in the article, reliability of the chosen algorithms, dataset size, and journal outlet. We summarize each study by assigning it to one of three pre-defined tasks: document classification, layout structure, or semantic analysis. We present the statistics, document type, language, tasks, input visual aspects, and ground truth information for every dataset. In addition, we provide the benchmark tasks and results from these papers or recent competitions. We further discuss gaps and challenges in this domain. We advocate for providing conversion tools to common formats (e.g., COCO format for computer vision tasks) and always providing a set of evaluation metrics, instead of just one, to make results comparable across studies.

updated: Mon Jul 25 2022 09:51:17 GMT+0000 (UTC)

published: Wed Mar 16 2022 09:56:48 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト