What If We Only Use Real Datasets for Scene Text Recognition? Toward Scene Text Recognition With Fewer Labels

Jeonghun Baek; Yusuke Matsui; Kiyoharu Aizawa

シーンテキスト認識に実際のデータセットのみを使用する場合はどうなりますか？より少ないラベルでのシーンテキスト認識に向けて

シーンテキスト認識（STR）タスクには、一般的な方法があります。すべての最先端のSTRモデルは、大規模な合成データでトレーニングされています。この方法とは対照的に、合成データなしでSTRモデルをトレーニングする必要がある場合は、実際のラベルが少ない（ラベルが少ないSTR）だけでSTRモデルをトレーニングすることが重要です。合成で生成するのが難しい手書きまたは芸術的なテキストや英語以外の言語の場合必ずしも合成データがあるとは限りません。ただし、実際のデータが不十分であるため、実際のデータでSTRモデルをトレーニングすることはほぼ不可能であるという暗黙の常識があります。この常識が、より少ないラベルでSTRの研究を妨げていると私たちは考えます。この作業では、常識を反証することにより、より少ないラベルでSTRを再アクティブ化したいと思います。最近蓄積された公開実データを統合し、実際のラベル付きデータでのみSTRモデルを十分にトレーニングできることを示します。続いて、実際のデータを完全に活用するための単純なデータ拡張を見つけます。さらに、ラベルのないデータを収集し、半教師ありおよび自己教師ありの方法を導入することにより、モデルを改善します。その結果、最先端の手法に対抗する競争力のあるモデルが得られます。私たちの知る限り、これは1）実際のラベルのみを使用して十分なパフォーマンスを示し、2）ラベルの数が少ないSTRに半教師ありおよび自己教師ありの方法を導入した最初の研究です。私たちのコードとデータが利用可能です：https：//github.com/ku21fan/STR-Fewer-Labels

Scene text recognition (STR) task has a common practice: All state-of-the-art STR models are trained on large synthetic data. In contrast to this practice, training STR models only on fewer real labels (STR with fewer labels) is important when we have to train STR models without synthetic data: for handwritten or artistic texts that are difficult to generate synthetically and for languages other than English for which we do not always have synthetic data. However, there has been implicit common knowledge that training STR models on real data is nearly impossible because real data is insufficient. We consider that this common knowledge has obstructed the study of STR with fewer labels. In this work, we would like to reactivate STR with fewer labels by disproving the common knowledge. We consolidate recently accumulated public real data and show that we can train STR models satisfactorily only with real labeled data. Subsequently, we find simple data augmentation to fully exploit real data. Furthermore, we improve the models by collecting unlabeled data and introducing semi- and self-supervised methods. As a result, we obtain a competitive model to state-of-the-art methods. To the best of our knowledge, this is the first study that 1) shows sufficient performance by only using real labels and 2) introduces semi- and self-supervised methods into STR with fewer labels. Our code and data are available: https://github.com/ku21fan/STR-Fewer-Labels

updated: Sat Jun 05 2021 12:53:58 GMT+0000 (UTC)

published: Sun Mar 07 2021 17:05:54 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト