Existing methods for arbitrary-shaped text detection in natural scenes face two critical issues, i.e., 1) fracture detections at the gaps in a text instance; and 2) inaccurate detections of arbitrary-shaped text instances with diverse background context. To address these issues, we propose a novel method named Intra- and Inter-Instance Collaborative Learning (I3CL). Specifically, to address the first issue, we design an effective convolutional module with multiple receptive fields, which is able to collaboratively learn better character and gap feature representations at local and long ranges inside a text instance. To address the second issue, we devise an instance-based transformer module to exploit the dependencies between different text instances and a pixel-based transformer module to exploit the global context from the shared background, which are able to collaboratively learn more discriminative text feature representations. In this way, I3CL can effectively exploit the intra- and inter-instance dependencies together in a unified end-to-end trainable framework. Experimental results show that the proposed I3CL sets new state-of-the-art performances on three challenging public benchmarks, i.e., an F-measure of 76.4% on ICDAR2019-ArT, 86.2% on Total-Text, and 85.8% on CTW-1500. Besides, I3CL with ResNeSt-101 backbone ranked 1st place on the ICDAR2019-ArT leaderboard. The source code will be made publicly available.
updated: Tue Aug 03 2021 07:48:12 GMT+0000 (UTC)
published: Tue Aug 03 2021 07:48:12 GMT+0000 (UTC)