SPTS v2: Single-Point Scene Text Spotting

Yuliang Liu; Jiaxin Zhang; Dezhi Peng; Mingxin Huang; Xinyu Wang; Jingqun Tang; Can Huang; Dahua Lin; Chunhua Shen; Xiang Bai; Lianwen Jin

SPTS v2: シングルポイントシーンテキストスポッティング

エンドツーエンドのシーンテキストスポッティングは、テキスト検出と認識の間の本質的な相乗効果により、大幅に進歩しました。従来の方法では一般に、水平長方形、回転長方形、四角形、多角形などの手動アノテーションが前提条件として考慮されており、単一点を使用するよりもはるかにコストがかかります。新しいフレームワークである SPTS v2 を使用すると、シングルポイントアノテーションを使用して高性能のテキストスポッティングモデルをトレーニングできます。 SPTS v2 は、同じ予測シーケンス内のすべてのテキストインスタンスの中心点を順次予測することでインスタンス割り当てデコーダー (IAD) を備えた自動回帰トランスフォーマーの利点を確保しており、並行してテキスト認識を行うための並列認識デコーダー (PRD) を備えています。これら 2 つのデコーダは同じパラメータを共有し、勾配と情報を渡すためのシンプルだが効果的な情報送信プロセスとインタラクティブに接続されています。さまざまな既存のベンチマークデータセットに対する包括的な実験により、SPTS v2 は、より少ないパラメーターで以前の最先端のシングルポイントテキストスポッターを上回り、19 倍の推論速度を達成できることが実証されました。 SPTS v2 フレームワークのコンテキスト内で、私たちの実験は、他の表現と比較した場合、シーンテキストスポッティングにおける単一ポイント表現が優先される可能性を示唆しています。このような試みは、既存のパラダイムの領域を超えたシーンテキストスポッティングアプリケーションに重要な機会を提供します。コードは https://github.com/Yuliang-Liu/SPTSv2 で入手できます。

End-to-end scene text spotting has made significant progress due to its intrinsic synergy between text detection and recognition. Previous methods commonly regard manual annotations such as horizontal rectangles, rotated rectangles, quadrangles, and polygons as a prerequisite, which are much more expensive than using single-point. Our new framework, SPTS v2, allows us to train high-performing text-spotting models using a single-point annotation. SPTS v2 reserves the advantage of the auto-regressive Transformer with an Instance Assignment Decoder (IAD) through sequentially predicting the center points of all text instances inside the same predicting sequence, while with a Parallel Recognition Decoder (PRD) for text recognition in parallel. These two decoders share the same parameters and are interactively connected with a simple but effective information transmission process to pass the gradient and information. Comprehensive experiments on various existing benchmark datasets demonstrate the SPTS v2 can outperform previous state-of-the-art single-point text spotters with fewer parameters while achieving 19× faster inference speed. Within the context of our SPTS v2 framework, our experiments suggest a potential preference for single-point representation in scene text spotting when compared to other representations. Such an attempt provides a significant opportunity for scene text spotting applications beyond the realms of existing paradigms. Code is available at https://github.com/Yuliang-Liu/SPTSv2.

updated: Tue Aug 08 2023 01:45:37 GMT+0000 (UTC)

published: Wed Jan 04 2023 14:20:14 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト