DEER: Detection-agnostic End-to-End Recognizer for Scene Text Spotting

Seonghyeon Kim; Seung Shin; Yoonsik Kim; Han-Cheol Cho; Taeho Kil; Jaeheung Surh; Seunghyun Park; Bado Lee; Youngmin Baek

DEER：シーンテキストスポッティングのための検出にとらわれないエンドツーエンドの認識機能

最近のエンドツーエンドのシーンテキストスポッターは、任意の形状のテキストインスタンスの認識において大幅な改善を達成しました。テキストスポッティングの一般的なアプローチでは、関心領域のプーリングまたはセグメンテーションマスクを使用して、機能を単一のテキストインスタンスに制限します。ただし、これにより、検出が正確でない場合、つまり1つ以上の文字が切り取られている場合に、認識機能が正しいシーケンスをデコードすることが困難になります。検出器だけでは単語の境界を正確に決定することは難しいことを考慮して、新しい検出にとらわれないエンドツーエンド認識器、DEER、フレームワークを提案します。提案された方法は、検出された領域を使用する代わりに、テキストインスタンスごとに単一の参照ポイントでそれらをブリッジすることによって、検出モジュールと認識モジュールの間の緊密な依存関係を減らします。提案された方法は、デコーダが、画像全体からの特徴とともに、参照点によって示されるテキストを認識することを可能にする。テキストを認識するのに必要なポイントは1つだけなので、提案された方法では、任意の形状の検出器や境界ポリゴンの注釈なしでテキストをスポッティングできます。実験結果は、提案された方法が通常の任意の形状のテキストスポッティングベンチマークで競争力のある結果を達成することを示しています。さらなる分析は、DEERが検出エラーに対してロバストであることを示しています。コードとデータセットは公開されます。

Recent end-to-end scene text spotters have achieved great improvement in recognizing arbitrary-shaped text instances. Common approaches for text spotting use region of interest pooling or segmentation masks to restrict features to single text instances. However, this makes it hard for the recognizer to decode correct sequences when the detection is not accurate i.e. one or more characters are cropped out. Considering that it is hard to accurately decide word boundaries with only the detector, we propose a novel Detection-agnostic End-to-End Recognizer, DEER, framework. The proposed method reduces the tight dependency between detection and recognition modules by bridging them with a single reference point for each text instance, instead of using detected regions. The proposed method allows the decoder to recognize the texts that are indicated by the reference point, with features from the whole image. Since only a single point is required to recognize the text, the proposed method enables text spotting without an arbitrarily-shaped detector or bounding polygon annotations. Experimental results present that the proposed method achieves competitive results on regular and arbitrarily-shaped text spotting benchmarks. Further analysis shows that DEER is robust to the detection errors. The code and dataset will be publicly available.

updated: Thu Mar 10 2022 02:41:05 GMT+0000 (UTC)

published: Thu Mar 10 2022 02:41:05 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト