TextSnake: A Flexible Representation for Detecting Text of Arbitrary Shapes

Shangbang Long; Jiaqiang Ruan; Wenjie Zhang; Xin He; Wenhao Wu; Cong Yao

TextSnake：任意の形状のテキストを検出するための柔軟な表現

ディープニューラルネットワークと大規模なデータセットによって駆動されるシーンテキスト検出方法は、過去数年にわたって大幅に進歩し、さまざまな標準ベンチマークのパフォーマンスレコードを継続的に更新しています。ただし、テキストを説明するために採用された表現（軸に沿った長方形、回転した長方形または四角形）によって制限されるため、既存の方法は、実際には非常に一般的な湾曲したテキストなど、より多くの自由形式のテキストインスタンスを処理するときに不十分になる場合があります。 -世界のシナリオ。この問題に取り組むために、TextSnakeと呼ばれる、シーンテキストのより柔軟な表現を提案します。これは、テキストインスタンスを水平で方向付けられた湾曲した形式で効果的に表すことができます。 TextSnakeでは、テキストインスタンスは、対称軸を中心とする順序付けられたオーバーラップディスクのシーケンスとして記述されます。各ディスクは、潜在的に変化する半径と方向に関連付けられています。このようなジオメトリ属性は、完全畳み込みネットワーク（FCN）モデルによって推定されます。実験では、TextSnakeに基づくテキスト検出器は、Total-TextとSCUT-CTW1500で最先端または同等のパフォーマンスを実現します。これは、自然画像の湾曲したテキストだけでなく、広く、使用されたデータセットICDAR 2015およびMSRA-TD500。具体的には、TextSnakeはTotal-TextのベースラインをFメジャーで40％以上上回っています。

Driven by deep neural networks and large scale datasets, scene text detection methods have progressed substantially over the past years, continuously refreshing the performance records on various standard benchmarks. However, limited by the representations (axis-aligned rectangles, rotated rectangles or quadrangles) adopted to describe text, existing methods may fall short when dealing with much more free-form text instances, such as curved text, which are actually very common in real-world scenarios. To tackle this problem, we propose a more flexible representation for scene text, termed as TextSnake, which is able to effectively represent text instances in horizontal, oriented and curved forms. In TextSnake, a text instance is described as a sequence of ordered, overlapping disks centered at symmetric axes, each of which is associated with potentially variable radius and orientation. Such geometry attributes are estimated via a Fully Convolutional Network (FCN) model. In experiments, the text detector based on TextSnake achieves state-of-the-art or comparable performance on Total-Text and SCUT-CTW1500, the two newly published benchmarks with special emphasis on curved text in natural images, as well as the widely-used datasets ICDAR 2015 and MSRA-TD500. Specifically, TextSnake outperforms the baseline on Total-Text by more than 40% in F-measure.

updated: Tue Aug 18 2020 00:54:35 GMT+0000 (UTC)

published: Wed Jul 04 2018 12:37:07 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト