I3CL:Intra- and Inter-Instance Collaborative Learning for Arbitrary-shaped Scene Text Detection

Jian Ye; Jing Zhang; Juhua Liu; Bo Du; Dacheng Tao

I3CL：任意の形状のシーンテキスト検出のためのインスタンス内およびインスタンス間の共学習

自然のシーンで任意の形状のテキストを検出するための既存の方法は、2つの重要な問題に直面しています。1）テキストインスタンスのギャップでの破損の検出。 2）さまざまな背景コンテキストを持つ任意の形状のテキストインスタンスの不正確な検出。これらの問題に対処するために、インスタンス内およびインスタンス間共学習（I3CL）という名前の新しい方法を提案します。具体的には、最初の問題に対処するために、複数の受容野を備えた効果的な畳み込みモジュールを設計します。これにより、テキストインスタンス内のローカルおよび長距離で、より優れた文字とギャップの特徴表現を共同で学習できます。 2番目の問題に対処するために、インスタンスベースのトランスフォーマーモジュールを考案して、異なるテキストインスタンス間の依存関係を活用し、グローバルコンテキストモジュールを考案して、共有バックグラウンドからセマンティックコンテキストを活用します。これにより、より識別力のあるテキスト機能表現を共同で学習できます。このようにして、I3CLは、統合されたエンドツーエンドのトレーニング可能なフレームワークで、インスタンス内およびインスタンス間の依存関係を効果的に活用できます。さらに、ラベルなしデータを最大限に活用するために、アンサンブル戦略を介して疑似ラベルを活用する効果的な半教師あり学習方法を設計します。ベルとホイッスルがない場合、実験結果は、提案されたI3CLが、3つの挑戦的な公開ベンチマークで新しい最先端の結果を設定することを示しています。つまり、ICDAR2019-ArTで77.5％、Total-Textで86.9％、およびCTW-1500で86.4％。特に、ResNeSt-101バックボーンを備えたI3CLは、ICDAR2019-ArTリーダーボードで1位にランクされました。ソースコードは公開されます。

Existing methods for arbitrary-shaped text detection in natural scenes face two critical issues, i.e., 1) fracture detections at the gaps in a text instance; and 2) inaccurate detections of arbitrary-shaped text instances with diverse background context. To address these issues, we propose a novel method named Intra- and Inter-Instance Collaborative Learning (I3CL). Specifically, to address the first issue, we design an effective convolutional module with multiple receptive fields, which is able to collaboratively learn better character and gap feature representations at local and long ranges inside a text instance. To address the second issue, we devise an instance-based transformer module to exploit the dependencies between different text instances and a global context module to exploit the semantic context from the shared background, which are able to collaboratively learn more discriminative text feature representation. In this way, I3CL can effectively exploit the intra- and inter-instance dependencies together in a unified end-to-end trainable framework. Besides, to make full use of the unlabeled data, we design an effective semi-supervised learning method to leverage the pseudo labels via an ensemble strategy. Without bells and whistles, experimental results show that the proposed I3CL sets new state-of-the-art results on three challenging public benchmarks, i.e., an F-measure of 77.5% on ICDAR2019-ArT, 86.9% on Total-Text, and 86.4% on CTW-1500. Notably, our I3CL with the ResNeSt-101 backbone ranked 1st place on the ICDAR2019-ArT leaderboard. The source code will be made publicly available.

updated: Mon Aug 16 2021 08:39:31 GMT+0000 (UTC)

published: Tue Aug 03 2021 07:48:12 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト