Double Supervised Network with Attention Mechanism for Scene Text   Recognition

Yuting Gao; Zheng Huang; Yuchen Dai; Cheng Xu; Kai Chen; Jie Tuo

シーンテキスト認識のための注意機構を備えた二重監視ネットワーク

Double Supervised Network with Attention Mechanism for Scene Text Recognition

この論文では、注意テキストを用いた新しいエンドツーエンドのトレーニング可能なフレームワークである、注意メカニズムを備えた二重監視ネットワーク（DSAN）を提案します。特徴抽出中に1つのテキストアテンションモジュールが組み込まれ、モデルがテキスト領域に焦点を合わせ、フレームワーク全体が2つのブランチによって監視されます。 1つの監視ブランチは、コンテキストレベルのモデリングに由来し、別の監視ブランチは、キャラクターレベルで不明確なセマンティック情報に取り組むことを目的とした1つの追加の監視強化ブランチに由来します。これらの2つの監視は相互に利益をもたらし、パフォーマンスを向上させます。提案されたアプローチは、任意の長さのテキストを認識でき、事前に定義された辞書を必要としません。私たちの方法は、3つのテキスト認識ベンチマークで現在の最先端の方法よりも優れています：IIIT5K、ICDAR2013、およびSVTの精度はそれぞれ88.6％、92.3％、84.1％に達し、提案された方法の有効性を示しています。

In this paper, we propose Double Supervised Network with Attention Mechanism (DSAN), a novel end-to-end trainable framework for scene text recognition. It incorporates one text attention module during feature extraction which enforces the model to focus on text regions and the whole framework is supervised by two branches. One supervision branch comes from context-level modelling and another comes from one extra supervision enhancement branch which aims at tackling inexplicit semantic information at character level. These two supervisions can benefit each other and yield better performance. The proposed approach can recognize text in arbitrary length and does not need any predefined lexicon. Our method outperforms the current state-of-the-art methods on three text recognition benchmarks: IIIT5K, ICDAR2013 and SVT reaching accuracy 88.6%, 92.3% and 84.1% respectively which suggests the effectiveness of the proposed method.

updated: Tue Oct 22 2019 13:05:11 GMT+0000 (UTC)

published: Thu Aug 02 2018 06:01:52 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト