Towards End-to-End Text Spotting in Natural Scenes

Peng Wang; Hui Li; Chunhua Shen

自然シーンでのエンドツーエンドのテキストスポッティングに向けて

自然なシーン画像でのテキストスポッティングは、多くの画像理解タスクにとって非常に重要です。テキスト検出と認識という2つのサブタスクが含まれます。この作業では、単一のフォワードパスでテキストのローカライズと認識を同時に行い、画像の切り取りや特徴の再計算、単語の分離、文字のグループ化などの中間プロセスを回避する統合ネットワークを提案します。テキストの検出と認識を2つの異なるタスクとして検討し、それらを1つずつ取り組む既存のアプローチとは対照的に、提案されたフレームワークはこれら2つのタスクを同時に解決します。フレームワーク全体をエンドツーエンドでトレーニングでき、任意の形状のテキストを処理できます。たたみ込み機能は一度だけ計算され、検出モジュールと認識モジュールの両方で共有されます。マルチタスクトレーニングにより、学習した機能がより識別しやすくなり、全体的なパフォーマンスが向上します。単語認識に2D注意モデルを採用することにより、テキストの不規則性に確実に対処できます。各文字の空間位置を提供します。これは、単語認識での局所特徴抽出に役立つだけでなく、テキストのローカライズを改善するための方向角も示します。提案された方法は、定期的なものと不規則なものの両方を含むいくつかの標準テキストスポッティングベンチマークで最先端のパフォーマンスを達成しました。

Text spotting in natural scene images is of great importance for many image understanding tasks. It includes two sub-tasks: text detection and recognition. In this work, we propose a unified network that simultaneously localizes and recognizes text with a single forward pass, avoiding intermediate processes such as image cropping and feature re-calculation, word separation, and character grouping. In contrast to existing approaches that consider text detection and recognition as two distinct tasks and tackle them one by one, the proposed framework settles these two tasks concurrently. The whole framework can be trained end-to-end and is able to handle text of arbitrary shapes. The convolutional features are calculated only once and shared by both detection and recognition modules. Through multi-task training, the learned features become more discriminate and improve the overall performance. By employing the 2D attention model in word recognition, the irregularity of text can be robustly addressed. It provides the spatial location for each character, which not only helps local feature extraction in word recognition, but also indicates an orientation angle to refine text localization. Our proposed method has achieved state-of-the-art performance on several standard text spotting benchmarks, including both regular and irregular ones.

updated: Sat Jun 26 2021 03:36:25 GMT+0000 (UTC)

published: Fri Jun 14 2019 04:20:55 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト