Towards Unconstrained End-to-End Text Spotting

制約のないエンドツーエンドのテキストスポッティングに向けて

任意の形状のテキストを同時に検出および認識できるエンドツーエンドのトレーニング可能なネットワークを提案し、不規則な形状のシーンテキストを読み取るという未解決の問題を大幅に改善します。インスタンスのセグメンテーション問題として、任意形状のテキスト検出を定式化します。次に、アテンションモデルを使用して、不規則な形状の各テキスト領域のテキストコンテンツを修正せずにデコードします。イメージスケールフィーチャから有用な不規則な形状のテキストインスタンスフィーチャを抽出するために、シンプルかつ効果的なRoIマスキングステップを提案します。さらに、既存のマルチステップOCRエンジンからの予測を部分的にラベル付けされたトレーニングデータとして活用できることを示します。これにより、モデルの検出と認識の両方の精度が大幅に向上します。私たちの方法は、ICDAR15（ストレート）ベンチマークでのエンドツーエンドの認識タスクの最新技術を4.6％超え、Total-Text（カーブ）ベンチマークで16％以上を上回っています。

We propose an end-to-end trainable network that can simultaneously detect and recognize text of arbitrary shape, making substantial progress on the open problem of reading scene text of irregular shape. We formulate arbitrary shape text detection as an instance segmentation problem; an attention model is then used to decode the textual content of each irregularly shaped text region without rectification. To extract useful irregularly shaped text instance features from image scale features, we propose a simple yet effective RoI masking step. Additionally, we show that predictions from an existing multi-step OCR engine can be leveraged as partially labeled training data, which leads to significant improvements in both the detection and recognition accuracy of our model. Our method surpasses the state-of-the-art for end-to-end recognition tasks on the ICDAR15 (straight) benchmark by 4.6%, and on the Total-Text (curved) benchmark by more than 16%.

updated: Sat Aug 24 2019 23:41:07 GMT+0000 (UTC)

published: Sat Aug 24 2019 23:41:07 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト