Arbitrary Shape Text Detection via Segmentation with Probability Maps

Shi-Xue Zhang; Xiaobin Zhu; Lei Chen; Jie-Bo Hou; Xu-Cheng Yin

確率マップによるセグメンテーションによる任意形状テキスト検出

任意の形状テキストの検出は、サイズとアスペクト比が大幅に異なること、任意の向きや形状、不正確な注釈などがあるため、困難な作業です。ピクセルレベルの予測のスケーラビリティにより、セグメンテーションベースの方法はさまざまな形状のテキストに適応できるため、最近大きな注目を集めています。ただし、テキストの正確なピクセルレベルの注釈は手ごわいものであり、シーンテキスト検出用の既存のデータセットは粗粒度の境界注釈しか提供しません。その結果、注釈内の誤って分類されたテキストピクセルまたは背景ピクセルが常に多数存在し、セグメンテーションベースのテキスト検出方法のパフォーマンスが低下します。一般に、ピクセルがテキストに属するかどうかは、隣接する注釈境界との距離に大きく関係します。この観察により、この論文では、テキストインスタンスを正確に検出するための確率マップを介した革新的で堅牢なセグメンテーションベースの検出方法を提案します。具体的には、Sigmoid Alpha Function (SAF) を採用して、境界とその内側のピクセル間の距離を確率マップに転送します。ただし、粗粒度のテキスト境界注釈の不確実性のため、1 つの確率マップでは複雑な確率分布をうまくカバーできません。したがって、一連のシグモイドアルファ関数によって計算された確率マップのグループを採用して、可能な確率分布を記述します。さらに、テキストインスタンスを再構築するのに十分な情報を提供するために、確率マップを予測および同化することを学習するための反復モデルを提案します。最後に、単純な領域成長アルゴリズムを採用して、確率マップを集約してテキストインスタンスを完成させます。実験結果は、私たちの方法がいくつかのベンチマークで検出精度の点で最先端のパフォーマンスを達成することを示しています。

Arbitrary shape text detection is a challenging task due to the significantly varied sizes and aspect ratios, arbitrary orientations or shapes, inaccurate annotations, etc. Due to the scalability of pixel-level prediction, segmentation-based methods can adapt to various shape texts and hence attracted considerable attention recently. However, accurate pixel-level annotations of texts are formidable, and the existing datasets for scene text detection only provide coarse-grained boundary annotations. Consequently, numerous misclassified text pixels or background pixels inside annotations always exist, degrading the performance of segmentation-based text detection methods. Generally speaking, whether a pixel belongs to text or not is highly related to the distance with the adjacent annotation boundary. With this observation, in this paper, we propose an innovative and robust segmentation-based detection method via probability maps for accurately detecting text instances. To be concrete, we adopt a Sigmoid Alpha Function (SAF) to transfer the distances between boundaries and their inside pixels to a probability map. However, one probability map can not cover complex probability distributions well because of the uncertainty of coarse-grained text boundary annotations. Therefore, we adopt a group of probability maps computed by a series of Sigmoid Alpha Functions to describe the possible probability distributions. In addition, we propose an iterative model to learn to predict and assimilate probability maps for providing enough information to reconstruct text instances. Finally, simple region growth algorithms are adopted to aggregate probability maps to complete text instances. Experimental results demonstrate that our method achieves state-of-the-art performance in terms of detection accuracy on several benchmarks.

updated: Fri Aug 26 2022 03:29:00 GMT+0000 (UTC)

published: Fri Aug 26 2022 03:29:00 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト