Towards Open-Set Text Recognition via Label-to-Prototype Learning

Chang Liu; Chun Yang; Hai-Bo Qin; Xiaobin Zhu; Cheng-Lin Liu; Xu-Cheng Yin

ラベルからプロトタイプへの学習によるオープンセットのテキスト認識に向けて

シーンテキスト認識は人気のあるトピックであり、業界で広く使用されています。多くの方法は、クローズセットのテキスト認識の課題に対して満足のいくパフォーマンスを達成していますが、これらの方法は、データの収集または新しい文字のモデルの再トレーニングが高コストになる可能性があるオープンセットのシナリオでは実現可能性を失います。たとえば、外国語のサンプルに注釈を付けるには費用がかかる可能性がありますが、歴史的な文書から新しいキャラクターが発見されるたびにモデルを再トレーニングするには、時間とリソースの両方が必要です。この論文では、再トレーニングなしで新しい文字を見つけて認識する能力を要求する新しいオープンセットテキスト認識タスクを導入して定式化します。ラベルからプロトタイプへの学習フレームワークも、提案されたタスクのベースラインとして提案されています。具体的には、フレームワークは、一般化可能なラベルからプロトタイプへのマッピング関数を導入して、表示クラスと非表示クラスの両方のプロトタイプ (クラスセンター) を構築します。次に、オープンセット予測子を使用して、プロトタイプに従ってサンプルを認識または拒否します。設定外の文字に対する拒否機能の実装により、着信データストリーム内の未知の文字を自動的に検出できます。広範な実験により、さまざまなゼロショット、クローズセット、およびオープンセットのテキスト認識データセットで、この方法が有望なパフォーマンスを達成することが示されています

Scene text recognition is a popular topic and extensively used in the industry. Although many methods have achieved satisfactory performance for the close-set text recognition challenges, these methods lose feasibility in open-set scenarios, where collecting data or retraining models for novel characters could yield a high cost. For example, annotating samples for foreign languages can be expensive, whereas retraining the model each time when a novel character is discovered from historical documents costs both time and resources. In this paper, we introduce and formulate a new open-set text recognition task which demands the capability to spot and recognize novel characters without retraining. A label-to-prototype learning framework is also proposed as a baseline for the proposed task. Specifically, the framework introduces a generalizable label-to-prototype mapping function to build prototypes (class centers) for both seen and unseen classes. An open-set predictor is then utilized to recognize or reject samples according to the prototypes. The implementation of rejection capability over out-of-set characters allows automatic spotting of unknown characters in the incoming data stream. Extensive experiments show that our method achieves promising performance on a variety of zero-shot, close-set, and open-set text recognition datasets

updated: Sun Aug 07 2022 04:23:25 GMT+0000 (UTC)

published: Thu Mar 10 2022 06:22:51 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト