Conceptual Text Region Network: Cognition-Inspired Accurate Scene Text Detection

Chenwei Cui; Liangfu Lu; Zhiyuan Tan; Amir Hussain

コンセプチュアルテキストリージョンネットワーク：認知に触発された正確なシーンテキスト検出

セグメンテーションベースの方法は、任意の形状のテキストインスタンスの記述に優れているため、シーンテキストの検出に広く使用されています。ただし、2つの大きな問題がまだ存在します。1）現在のラベル生成手法はほとんど経験的であり、理論的なサポートが不足しているため、手の込んだラベルデザインを思いとどまらせます。 2）結果として、ほとんどの方法は、不安定で慎重な調整を必要とするテキストカーネルセグメンテーションに大きく依存しています。これらの課題に対処するために、Conceptual Text Region Network（CTRNet）と呼ばれる人間の認知に触発されたフレームワークを提案します。このフレームワークは、優れた数学的特性を継承する認知ベースのツールのクラスであるConceptual Text Regions（CTR）を利用して、洗練されたラベルデザインを可能にします。 CTRNetのもう1つのコンポーネントは、CTRの助けを借りて、テキストカーネルセグメンテーションの必要性を完全に排除する推論パイプラインです。以前のセグメンテーションベースの方法と比較して、私たちのアプローチはより解釈しやすいだけでなく、より正確です。実験結果は、CTRNetがベンチマークCTW1500、Total-Text、MSRA-TD500、およびICDAR 2015データセットで最先端のパフォーマンスを達成し、最大2.0％のパフォーマンス向上をもたらすことを示しています。特に、私たちの知る限り、CTRNetは、4つのベンチマークすべてで85.0％を超えるFメジャーを達成し、卓越した一貫性と安定性を備えた最初の検出モデルの1つです。

Segmentation-based methods are widely used for scene text detection due to their superiority in describing arbitrary-shaped text instances. However, two major problems still exist: 1) current label generation techniques are mostly empirical and lack theoretical support, discouraging elaborate label design; 2) as a result, most methods rely heavily on text kernel segmentation which is unstable and requires deliberate tuning. To address these challenges, we propose a human cognition-inspired framework, termed, Conceptual Text Region Network (CTRNet). The framework utilizes Conceptual Text Regions (CTRs), which is a class of cognition-based tools inheriting good mathematical properties, allowing for sophisticated label design. Another component of CTRNet is an inference pipeline that, with the help of CTRs, completely omits the need for text kernel segmentation. Compared with previous segmentation-based methods, our approach is not only more interpretable but also more accurate. Experimental results show that CTRNet achieves state-of-the-art performance on benchmark CTW1500, Total-Text, MSRA-TD500, and ICDAR 2015 datasets, yielding performance gains of up to 2.0%. Notably, to the best of our knowledge, CTRNet is among the first detection models to achieve F-measures higher than 85.0% on all four of the benchmarks, with remarkable consistency and stability.

updated: Tue Mar 16 2021 16:28:33 GMT+0000 (UTC)

published: Tue Mar 16 2021 16:28:33 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト