Why You Should Try the Real Data for the Scene Text Recognition

Vladimir Loginov

シーンテキスト認識のために実際のデータを試す必要がある理由

テキスト認識分野での最近の研究は、認識結果を新しい地平に押し進めました。しかし、長い間、人間がラベル付けした大規模な自然テキスト認識データセットが不足しているため、研究者はテキスト認識モデルのトレーニングに合成データを使用する必要がありました。合成データセットは非常に大きいですが（2つの最も有名な合成データセットであるMJSynthとSynthTestには、それぞれ数百万の画像があります）、ICDARなどの自然なデータセットと比較して多様性が不十分である可能性があります。幸い、最近リリースされたOpenImages V5データセットのテキスト認識アノテーションは、合成データセットのインスタンス数やより多様な例に匹敵します。この注釈をYetAnother Mask Text Spotterのテキスト認識ヘッドアーキテクチャで使用し、SOTAの結果に匹敵するようになりました。一部のデータセットでは、以前のSOTAモデルを上回っています。この論文では、テキスト認識モデルも紹介します。モデルのコードが利用可能です。

Recent works in the text recognition area have pushed forward the recognition results to the new horizons. But for a long time a lack of large human-labeled natural text recognition datasets has been forcing researchers to use synthetic data for training text recognition models. Even though synthetic datasets are very large (MJSynth and SynthTest, two most famous synthetic datasets, have several million images each), their diversity could be insufficient, compared to natural datasets like ICDAR and others. Fortunately, the recently released text-recognition annotation for OpenImages V5 dataset has comparable with synthetic dataset number of instances and more diverse examples. We have used this annotation with a Text Recognition head architecture from the Yet Another Mask Text Spotter and got comparable to the SOTA results. On some datasets we have even outperformed previous SOTA models. In this paper we also introduce a text recognition model. The model's code is available.

updated: Thu Jul 29 2021 12:58:57 GMT+0000 (UTC)

published: Thu Jul 29 2021 12:58:57 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト