LRANet: Towards Accurate and Efficient Scene Text Detection with Low-Rank Approximation Network

Yuchen Su; Zhineng Chen; Zhiwen Shao; Yuning Du; Zhilong Ji; Jinfeng Bai; Yong Zhou; Yu-Gang Jiang

LRANet: 低ランク近似ネットワークによる正確かつ効率的なシーンテキスト検出に向けて

最近、テキストの位置特定のためにパラメータ化されたテキストの形状を予測する回帰ベースの方法が、シーンのテキスト検出で人気を集めています。ただし、既存のパラメータ化されたテキスト形状メソッドでは、テキスト固有の形状情報の利用が無視されているため、任意の形状のテキストをモデル化する際に依然として制限があります。さらに、パイプライン全体の時間消費がほとんど見落とされており、全体的な推論速度が最適化されていません。これらの問題に対処するために、我々はまず、低ランク近似に基づいた新しいパラメータ化されたテキスト形状方法を提案します。データに関係のないパラメータ化を使用する他の形状表現方法とは異なり、私たちのアプローチは特異値分解を利用し、ラベル付きテキスト輪郭から学習したいくつかの固有ベクトルを使用してテキスト形状を再構成します。さまざまなテキスト輪郭間の形状相関を調査することにより、私たちの方法は形状表現の一貫性、コンパクトさ、シンプルさ、堅牢性を実現します。次に、速度加速のための二重割り当てスキームを提案します。疎な割り当てブランチを採用して推論速度を高速化すると同時に、密な割り当てブランチを通じてトレーニング用に十分な教師あり信号を提供します。これらの設計に基づいて、LRANet という名前の正確かつ効率的な任意形状のテキスト検出器を実装します。いくつかの困難なベンチマークに対して広範な実験が実施され、最先端の方法と比較して LRANet の精度と効率が優れていることが実証されています。コードは近日公開予定です。

Recently, regression-based methods, which predict parameterized text shapes for text localization, have gained popularity in scene text detection. However, the existing parameterized text shape methods still have limitations in modeling arbitrary-shaped texts due to ignoring the utilization of text-specific shape information. Moreover, the time consumption of the entire pipeline has been largely overlooked, leading to a suboptimal overall inference speed. To address these issues, we first propose a novel parameterized text shape method based on low-rank approximation. Unlike other shape representation methods that employ data-irrelevant parameterization, our approach utilizes singular value decomposition and reconstructs the text shape using a few eigenvectors learned from labeled text contours. By exploring the shape correlation among different text contours, our method achieves consistency, compactness, simplicity, and robustness in shape representation. Next, we propose a dual assignment scheme for speed acceleration. It adopts a sparse assignment branch to accelerate the inference speed, and meanwhile, provides ample supervised signals for training through a dense assignment branch. Building upon these designs, we implement an accurate and efficient arbitrary-shaped text detector named LRANet. Extensive experiments are conducted on several challenging benchmarks, demonstrating the superior accuracy and efficiency of LRANet compared to state-of-the-art methods. Code will be released soon.

updated: Thu Aug 31 2023 05:48:06 GMT+0000 (UTC)

published: Tue Jun 27 2023 02:03:46 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト