Dynamic Low-Resolution Distillation for Cost-Efficient End-to-End Text Spotting

Ying Chen; Liang Qiao1; Zhanzhan Cheng; Shiliang Pu; Yi Niu; Xi Li

費用効果の高いエンドツーエンドのテキストスポッティングのための動的な低解像度蒸留

エンドツーエンドのテキストスポッティングは、グローバルな最適化と実際のアプリケーションの高い保守性の利点により、最近大きな注目を集めています。ただし、小さなテキストインスタンスを認識するには通常、画像全体を拡大する必要があり、計算コストが高くなるため、入力スケールは常に厳しいトレードオフでした。この論文では、この問題に対処するために、新しいコスト効率の高い動的低解像度蒸留（DLD）テキストスポッティングフレームワークを提案します。これは、小さいが認識可能なさまざまな解像度の画像を推測し、精度と効率のバランスを改善することを目的としています。具体的には、解像度セレクターを採用して、さまざまな画像の入力解像度を動的に決定します。これは、推論の精度と計算コストの両方による制約です。別の順次知識蒸留戦略がテキスト認識ブランチで実行され、低解像度の入力で高解像度の画像と同等のパフォーマンスが得られます。提案された方法は、エンドツーエンドで最適化され、実用性を向上させるために現在のテキストスポッティングフレームワークに採用されます。いくつかのテキストスポッティングベンチマークでの広範な実験は、提案された方法が低解像度モデルの使いやすさを大幅に改善することを示しています。コードはhttps://github.com/hikopensource/DAVAR-Lab-OCR/で入手できます。

End-to-end text spotting has attached great attention recently due to its benefits on global optimization and high maintainability for real applications. However, the input scale has always been a tough trade-off since recognizing a small text instance usually requires enlarging the whole image, which brings high computational costs. In this paper, to address this problem, we propose a novel cost-efficient Dynamic Low-resolution Distillation (DLD) text spotting framework, which aims to infer images in different small but recognizable resolutions and achieve a better balance between accuracy and efficiency. Concretely, we adopt a resolution selector to dynamically decide the input resolutions for different images, which is constraint by both inference accuracy and computational cost. Another sequential knowledge distillation strategy is conducted on the text recognition branch, making the low-res input obtains comparable performance to a high-res image. The proposed method can be optimized end-to-end and adopted in any current text spotting framework to improve the practicability. Extensive experiments on several text spotting benchmarks show that the proposed method vastly improves the usability of low-res models. The code is available at https://github.com/hikopensource/DAVAR-Lab-OCR/.

updated: Thu Jul 14 2022 06:49:59 GMT+0000 (UTC)

published: Thu Jul 14 2022 06:49:59 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト