Arbitrary Shape Text Detection via Boundary Transformer

Shi-Xue Zhang; Chun Yang; Xiaobin Zhu; Xu-Cheng Yin

境界トランスフォーマーによる任意形状テキスト検出

任意形状のテキスト検出では、正確なテキスト境界を見つけることは困難であり、簡単ではありません。既存の方法では、多くの場合、間接的なテキスト境界モデリングや複雑な後処理が必要になります。この論文では、任意の形状のテキスト検出のための境界学習を介した粗いから細かいまでの統合フレームワークを系統的に提示します。これにより、後処理なしでテキスト境界を正確かつ効率的に特定できます。私たちの方法では、粗い方法から細かい方法まで革新的な反復境界変換器を介してテキスト境界を明示的にモデル化します。このようにして、私たちの方法は正確なテキスト境界を直接取得し、複雑な後処理を放棄して効率を向上させることができます。具体的には、私たちの方法は主に、特徴抽出バックボーン、境界提案モジュール、および反復的に最適化された境界変換モジュールで構成されています。多層の拡張畳み込みで構成される境界提案モジュールは、境界変換器の最適化をガイドしながら、粗い境界提案を生成するための重要な事前情報 (分類マップ、距離フィールド、および方向フィールドを含む) を計算します。境界変換モジュールはエンコーダ - デコーダ構造を採用しており、エンコーダは残留接続を備えた多層トランスフォーマブロックで構成され、デコーダは単純な多層パーセプトロンネットワーク (MLP) です。事前の情報に基づいて、境界変換モジュールは境界変形を反復して粗い境界提案を徐々に洗練させます。さらに、境界改良の学習をさらに最適化および安定化するために、エネルギー最小化制約とエネルギー単調減少制約を導入する新しい境界エネルギー損失（BEL）を提案します。公的に入手可能な困難なデータセットに対する広範な実験により、私たちの手法の最先端のパフォーマンスと有望な効率が実証されました。

In arbitrary shape text detection, locating accurate text boundaries is challenging and non-trivial. Existing methods often suffer from indirect text boundary modeling or complex post-processing. In this paper, we systematically present a unified coarse-to-fine framework via boundary learning for arbitrary shape text detection, which can accurately and efficiently locate text boundaries without post-processing. In our method, we explicitly model the text boundary via an innovative iterative boundary transformer in a coarse-to-fine manner. In this way, our method can directly gain accurate text boundaries and abandon complex post-processing to improve efficiency. Specifically, our method mainly consists of a feature extraction backbone, a boundary proposal module, and an iteratively optimized boundary transformer module. The boundary proposal module consisting of multi-layer dilated convolutions will compute important prior information (including classification map, distance field, and direction field) for generating coarse boundary proposals while guiding the boundary transformer's optimization. The boundary transformer module adopts an encoder-decoder structure, in which the encoder is constructed by multi-layer transformer blocks with residual connection while the decoder is a simple multi-layer perceptron network (MLP). Under the guidance of prior information, the boundary transformer module will gradually refine the coarse boundary proposals via iterative boundary deformation. Furthermore, we propose a novel boundary energy loss (BEL) which introduces an energy minimization constraint and an energy monotonically decreasing constraint to further optimize and stabilize the learning of boundary refinement. Extensive experiments on publicly available and challenging datasets demonstrate the state-of-the-art performance and promising efficiency of our method.

updated: Tue Jun 20 2023 03:00:29 GMT+0000 (UTC)

published: Wed May 11 2022 07:59:13 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト