PAN++: Towards Efficient and Accurate End-to-End Spotting of Arbitrarily-Shaped Text

Wenhai Wang; Enze Xie; Xiang Li; Xuebo Liu; Ding Liang; Zhibo Yang; Tong Lu; Chunhua Shen

PAN ++：任意の形状のテキストの効率的かつ正確なエンドツーエンドのスポッティングに向けて

シーンテキストの検出と認識は、過去数年間で十分に検討されてきました。進歩にもかかわらず、任意の形状のテキストの効率的かつ正確なエンドツーエンドのスポッティングは依然として困難です。この作業では、PAN ++と呼ばれるエンドツーエンドのテキストスポッティングフレームワークを提案します。これは、自然のシーンで任意の形状のテキストを効率的に検出および認識することができます。 PAN ++は、テキスト行を周辺ピクセルで囲まれたテキストカーネル（中央領域）として再定式化するカーネル表現に基づいています。既存のシーンテキスト表現と体系的に比較することにより、カーネル表現が任意の形状のテキストを記述するだけでなく、隣接するテキストをうまく区別できることを示します。さらに、ピクセルベースの表現として、カーネル表現は単一の完全畳み込みネットワークによって予測できます。これは、リアルタイムアプリケーションに非常に適しています。カーネル表現の利点を利用して、次のように一連のコンポーネントを設計します。1）積み重ねられた特徴ピラミッド拡張モジュール（FPEM）で構成される計算効率の高い機能拡張ネットワーク。 2）Pixel Aggregation（PA）と連携する軽量の検出ヘッド。 3）MaskedRoIを備えた効率的な注意ベースの認識ヘッド。カーネル表現と調整されたコンポーネントの恩恵を受けて、私たちの方法は、競争力のある精度を維持しながら、高い推論速度を達成します。広範な実験は、私たちの方法の優位性を示しています。たとえば、提案されたPAN ++は、Total-Textデータセットで29.2FPSで64.9のエンドツーエンドのテキストスポッティングFメジャーを達成します。これは、以前の最良の方法を大幅に上回っています。コードはhttps://git.io/PANで入手できます。

Scene text detection and recognition have been well explored in the past few years. Despite the progress, efficient and accurate end-to-end spotting of arbitrarily-shaped text remains challenging. In this work, we propose an end-to-end text spotting framework, termed PAN++, which can efficiently detect and recognize text of arbitrary shapes in natural scenes. PAN++ is based on the kernel representation that reformulates a text line as a text kernel (central region) surrounded by peripheral pixels. By systematically comparing with existing scene text representations, we show that our kernel representation can not only describe arbitrarily-shaped text but also well distinguish adjacent text. Moreover, as a pixel-based representation, the kernel representation can be predicted by a single fully convolutional network, which is very friendly to real-time applications. Taking the advantages of the kernel representation, we design a series of components as follows: 1) a computationally efficient feature enhancement network composed of stacked Feature Pyramid Enhancement Modules (FPEMs); 2) a lightweight detection head cooperating with Pixel Aggregation (PA); and 3) an efficient attention-based recognition head with Masked RoI. Benefiting from the kernel representation and the tailored components, our method achieves high inference speed while maintaining competitive accuracy. Extensive experiments show the superiority of our method. For example, the proposed PAN++ achieves an end-to-end text spotting F-measure of 64.9 at 29.2 FPS on the Total-Text dataset, which significantly outperforms the previous best method. Code will be available at: https://git.io/PAN.

updated: Sun May 09 2021 07:46:43 GMT+0000 (UTC)

published: Sun May 02 2021 07:04:30 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト