Digital Peter: Dataset, Competition and Handwriting Recognition Methods

Mark Potanin; Denis Dimitrov; Alex Shonenkov; Vladimir Bataev; Denis Karachev; Maxim Novopoltsev

デジタルピーター：データセット、競争、手書き認識方法

このホワイトペーパーでは、ピョートル大帝の原稿の新しいデータセットを紹介し、ドキュメントの初期画像を行に変換するセグメンテーション手順について説明します。新しいデータセットは、研究者がさまざまなモデルを比較するためのベンチマークとして手書きテキスト認識モデルをトレーニングするのに役立つ場合があります。これは、歴史的文書の行に対応する9694枚の画像とテキストファイルで構成されています。検討したデータセットに基づいて、オープンな機械学習コンテストDigitalPeterが開催されました。このコンテストのベースラインソリューションと、手書きテキスト認識のより高度な方法については、この記事で説明しています。完全なデータセットとすべてのコードは公開されています。

This paper presents a new dataset of Peter the Great's manuscripts and describes a segmentation procedure that converts initial images of documents into the lines. The new dataset may be useful for researchers to train handwriting text recognition models as a benchmark for comparing different models. It consists of 9 694 images and text files corresponding to lines in historical documents. The open machine learning competition Digital Peter was held based on the considered dataset. The baseline solution for this competition as well as more advanced methods on handwritten text recognition are described in the article. Full dataset and all code are publicly available.

updated: Tue Mar 16 2021 22:37:22 GMT+0000 (UTC)

published: Tue Mar 16 2021 22:37:22 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト