Polygon-free: Unconstrained Scene Text Detection with Box Annotations

Weijia Wu; Enze Xie; Ruimao Zhang; Wenhai Wang; Hong Zhou; Ping Luo

ポリゴンフリー：ボックス注釈を使用した制約のないシーンテキスト検出

ポリゴンは、テキスト検出用の直立バウンディングボックスよりも正確な表現ですが、ポリゴンの注釈は非常に高価で困難です。ポリゴン注釈を使用した完全に監視されたトレーニングを採用する既存の作品とは異なり、この研究では、ポリゴンフリー（PF）と呼ばれる制約のないテキスト検出システムを提案します。）は、直立したバウンディングボックスの注釈のみでトレーニングされます。私たちの中心的なアイデアは、知識を合成データから実際のデータに転送して、直立したバウンディングボックスの監視情報を強化することです。これは、単純なセグメンテーションネットワーク、つまりスケルトンアテンションセグメンテーションネットワーク（SASN）で可能になります。これには、3つの重要なコンポーネント（チャネルアテンション、空間アテンション、スケルトンアテンションマップ）と1つのソフトクロスエントロピー損失が含まれます。実験は、提案されたPolygonfreeシステムが一般的な検出器（EAST、PSENet、DBなど）を組み合わせて、さまざまなデータセット（ICDAR2019-Art、TotalText、 ICDAR2015）。たとえば、ポリゴンアノテーションを使用しない場合、PSENetはTotalText [3]で80.5％のFスコアを達成し（完全に監視された対応物の80.9％に対して）、直立したバウンディングボックスアノテーションを使用して直接トレーニングするよりも31.1％優れ、80％以上のラベル付けを節約しますコスト。 PFがテキスト検出の新しい視点を提供し、ラベリングコストを削減できることを願っています。コードはhttps://github.com/weijiawu/Unconstrained-Text-Detection-with-Box-Supervisionand-Dynamic-Self-Trainingにあります。

Although a polygon is a more accurate representation than an upright bounding box for text detection, the annotations of polygons are extremely expensive and challenging. Unlike existing works that employ fully-supervised training with polygon annotations, this study proposes an unconstrained text detection system termed Polygon-free (PF), in which most existing polygon-based text detectors (e.g., PSENet [33],DB [16]) are trained with only upright bounding box annotations. Our core idea is to transfer knowledge from synthetic data to real data to enhance the supervision information of upright bounding boxes. This is made possible with a simple segmentation network, namely Skeleton Attention Segmentation Network (SASN), that includes three vital components (i.e., channel attention, spatial attention and skeleton attention map) and one soft cross-entropy loss. Experiments demonstrate that the proposed Polygonfree system can combine general detectors (e.g., EAST, PSENet, DB) to yield surprisingly high-quality pixel-level results with only upright bounding box annotations on a variety of datasets (e.g., ICDAR2019-Art, TotalText, ICDAR2015). For example, without using polygon annotations, PSENet achieves an 80.5% F-score on TotalText [3] (vs. 80.9% of fully supervised counterpart), 31.1% better than training directly with upright bounding box annotations, and saves 80%+ labeling costs. We hope that PF can provide a new perspective for text detection to reduce the labeling costs. The code can be found at https://github.com/weijiawu/Unconstrained-Text-Detection-with-Box-Supervisionand-Dynamic-Self-Training.

updated: Thu May 26 2022 10:47:26 GMT+0000 (UTC)

published: Thu Nov 26 2020 14:19:33 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト