SPTS v2: Single-Point Scene Text Spotting

Yuliang Liu; Jiaxin Zhang; Dezhi Peng; Mingxin Huang; Xinyu Wang; Jingqun Tang; Can Huang; Dahua Lin; Chunhua Shen; Xiang Bai; Lianwen Jin

SPTS v2: シングルポイントシーンテキストスポッティング

エンドツーエンドのシーンテキストスポッティングは、テキスト検出と認識の間の本質的な相乗効果により、大幅に進歩しました。従来の方法では一般に、水平長方形、回転長方形、四角形、多角形などの手動アノテーションが前提条件として考慮されており、単一点を使用するよりもはるかにコストがかかります。 SPTS v2 と呼ばれる提案されたフレームワークによって、トレーニングシーンのテキストスポッティングモデルが非常に低コストの単一ポイントアノテーションで実現できることを初めて実証します。 SPTS v2 は、同じ予測シーケンス内のすべてのテキストインスタンスの中心点を順次予測することでインスタンス割り当てデコーダー (IAD) を備えた自動回帰トランスフォーマーの利点を確保しており、並行してテキスト認識を行うための並列認識デコーダー (PRD) を備えています。これら 2 つのデコーダは同じパラメータを共有し、勾配と情報を渡すためのシンプルだが効果的な情報送信プロセスとインタラクティブに接続されています。さまざまな既存のベンチマークデータセットに対する包括的な実験により、SPTS v2 は、より少ないパラメーターで以前の最先端のシングルポイントテキストスポッターを上回り、19 倍の推論速度を達成できることが実証されました。最も重要なことは、SPTS v2 の範囲内での広範な実験により、非ポイント、長方形のバウンディングボックス、および多角形のバウンディングボックスと比較して、単一ポイントがシーンテキストスポッティングの最適な設定として機能するという重要な現象がさらに明らかになったということです。このような試みは、既存のパラダイムの領域を超えたシーンテキストスポッティングアプリケーションに重要な機会を提供します。コードは https://github.com/bytedance/SPTSv2 で入手できます。

End-to-end scene text spotting has made significant progress due to its intrinsic synergy between text detection and recognition. Previous methods commonly regard manual annotations such as horizontal rectangles, rotated rectangles, quadrangles, and polygons as a prerequisite, which are much more expensive than using single-point. For the first time, we demonstrate that training scene text spotting models can be achieved with an extremely low-cost single-point annotation by the proposed framework, termed SPTS v2. SPTS v2 reserves the advantage of the auto-regressive Transformer with an Instance Assignment Decoder (IAD) through sequentially predicting the center points of all text instances inside the same predicting sequence, while with a Parallel Recognition Decoder (PRD) for text recognition in parallel. These two decoders share the same parameters and are interactively connected with a simple but effective information transmission process to pass the gradient and information. Comprehensive experiments on various existing benchmark datasets demonstrate the SPTS v2 can outperform previous state-of-the-art single-point text spotters with fewer parameters while achieving 19× faster inference speed. Most importantly, within the scope of our SPTS v2, extensive experiments further reveal an important phenomenon that single-point serves as the optimal setting for the scene text spotting compared to non-point, rectangular bounding box, and polygonal bounding box. Such an attempt provides a significant opportunity for scene text spotting applications beyond the realms of existing paradigms. Code will be available at https://github.com/bytedance/SPTSv2.

updated: Tue May 30 2023 13:59:43 GMT+0000 (UTC)

published: Wed Jan 04 2023 14:20:14 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト