Weakly-Supervised Text Instance Segmentation

Xinyan Zu; Haiyang Yu; Bin Li; Xiangyang Xue

弱教師付きテキストインスタンスセグメンテーション

テキストのセグメンテーションは、多くのダウンストリームアプリケーションで困難なビジョンタスクです。現在のテキストセグメンテーション方法は、ピクセルレベルの注釈を必要としますが、これは人件費が高くつき、アプリケーションシナリオが制限されます。このホワイトペーパーでは、テキスト認識とテキストセグメンテーションを橋渡しすることにより、教師が弱いテキストインスタンスセグメンテーションを実行する最初の試みを行います。洞察は、テキスト認識メソッドが各テキストインスタンスの正確な注意位置を提供し、注意位置がテキストアダプティブリファインメントヘッド (TAR) とテキストセグメンテーションヘッドの両方にフィードできることです。具体的には、提案されたTARは、対応するテキストインスタンスの正確な境界に適合するように、注意位置で2段階の反復改良操作を実行することにより、疑似ラベルを生成します。その間、テキストセグメンテーションヘッドは、前述の擬似ラベルによって監視されるセグメンテーションマスクを予測するために大まかな注意位置を取ります。さらに、セグメンテーション結果を入力テキスト画像の拡張バージョンとして扱うことにより、マスク拡張された対照学習を設計し、視覚的表現を改善し、認識とセグメンテーションの両方のパフォーマンスをさらに向上させます。実験結果は、提案された方法が、ICDAR13-FST (18.95% の改善) および TextSeg (17.80% の改善) ベンチマークで、教師が弱いインスタンスセグメンテーション方法よりも大幅に優れていることを示しています。

Text segmentation is a challenging vision task with many downstream applications. Current text segmentation methods require pixel-level annotations, which are expensive in the cost of human labor and limited in application scenarios. In this paper, we take the first attempt to perform weakly-supervised text instance segmentation by bridging text recognition and text segmentation. The insight is that text recognition methods provide precise attention position of each text instance, and the attention location can feed to both a text adaptive refinement head (TAR) and a text segmentation head. Specifically, the proposed TAR generates pseudo labels by performing two-stage iterative refinement operations on the attention location to fit the accurate boundaries of the corresponding text instance. Meanwhile, the text segmentation head takes the rough attention location to predict segmentation masks which are supervised by the aforementioned pseudo labels. In addition, we design a mask-augmented contrastive learning by treating our segmentation result as an augmented version of the input text image, thus improving the visual representation and further enhancing the performance of both recognition and segmentation. The experimental results demonstrate that the proposed method significantly outperforms weakly-supervised instance segmentation methods on ICDAR13-FST (18.95% improvement) and TextSeg (17.80% improvement) benchmarks.

updated: Thu Mar 23 2023 07:56:07 GMT+0000 (UTC)

published: Mon Mar 20 2023 03:56:47 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト