RusTitW: Russian Language Text Dataset for Visual Text in-the-Wild Recognition

Igor Markov; Sergey Nesteruk; Andrey Kuznetsov; Denis Dimitrov

RusTitW: 野生の視覚テキスト認識用のロシア語テキストデータセット

現代の生活では、情報が人々を取り囲んでいます。テキストは、人々が何世紀にもわたってコミュニケーションに使用してきた非常に効率的なタイプの情報です。ただし、自動化されたテキストインザワイルド認識は依然として困難な問題です。 DL システムの主な制限は、トレーニングデータの不足です。競争力のあるパフォーマンスを得るには、トレーニングセットに実際のケースを再現する多くのサンプルを含める必要があります。英語のテキスト認識用の高品質のデータセットは多数ありますが、ロシア語の利用可能なデータセットはありません。このホワイトペーパーでは、ロシア語のテキスト認識用の大規模なヒューマンラベル付きデータセットを紹介します。生成プロセスを再現するための合成データセットとコードも公開しています

Information surrounds people in modern life. Text is a very efficient type of information that people use for communication for centuries. However, automated text-in-the-wild recognition remains a challenging problem. The major limitation for a DL system is the lack of training data. For the competitive performance, training set must contain many samples that replicate the real-world cases. While there are many high-quality datasets for English text recognition; there are no available datasets for Russian language. In this paper, we present a large-scale human-labeled dataset for Russian text recognition in-the-wild. We also publish a synthetic dataset and code to reproduce the generation process

updated: Wed Mar 29 2023 08:38:55 GMT+0000 (UTC)

published: Wed Mar 29 2023 08:38:55 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト