HiREN: Towards Higher Supervision Quality for Better Scene Text Image Super-Resolution

Minyi Zhao; Yi Xu; Bingjia Li; Jie Wang; Jihong Guan; Shuigeng Zhou

HiREN: より優れたシーンテキスト画像の超解像度を実現するより高い監視品質を目指して

シーンテキストイメージ超解像度 (STISR) は、低解像度のシーンイメージからテキストを認識するための重要な前処理技術です。現在、STISR モデルのトレーニングを監視するために、高解像度 (HR) 画像からテキスト固有の情報を抽出するさまざまな方法が提案されています。ただし、手動で HR 画像を撮影する際の制御できない要因 (撮影機材、焦点、環境など) により、HR 画像の品質は保証できず、STISR のパフォーマンスに影響を与えることは避けられません。 HR 画像の品質問題を観察し、この論文では、最初に HR 画像の品質を向上させ、次に強化された HR 画像を STISR を実行するための監視として使用することにより、STISR を向上させる新しいアイデアを提案します。具体的には、2 つのブランチと品質推定モジュールで構成される High-Resolution ENhancement (HiREN) と呼ばれる新しい STISR フレームワークを開発します。最初のブランチは低解像度 (LR) 画像を復元するために開発され、もう 1 つは HR 画像に基づいて高品質 (HQ) テキスト画像を生成し、LR 画像をより正確に監視することを目的とした HR 品質向上ブランチです。。 HQ から HR への劣化は多様である可能性があり、HQ 画像生成にはピクセルレベルの監視がないため、さまざまな劣化を処理するカーネル主導の拡張ネットワークを設計し、認識エンジンとテキストレベルの注釈からのフィードバックを利用します。 HR 強化ブランチをトレーニングするための弱い監視信号。次に、品質推定モジュールを使用して HQ 画像の品質を評価し、各画像の損失を重み付けすることで誤った監視情報を抑制するために使用されます。 TextZoom に関する広範な実験により、HiREN がほとんどの既存の STISR メソッドとうまく連携し、パフォーマンスが大幅に向上することが示されました。

Scene text image super-resolution (STISR) is an important pre-processing technique for text recognition from low-resolution scene images. Nowadays, various methods have been proposed to extract text-specific information from high-resolution (HR) images to supervise STISR model training. However, due to uncontrollable factors (e.g. shooting equipment, focus, and environment) in manually photographing HR images, the quality of HR images cannot be guaranteed, which unavoidably impacts STISR performance. Observing the quality issue of HR images, in this paper we propose a novel idea to boost STISR by first enhancing the quality of HR images and then using the enhanced HR images as supervision to do STISR. Concretely, we develop a new STISR framework, called High-Resolution ENhancement (HiREN) that consists of two branches and a quality estimation module. The first branch is developed to recover the low-resolution (LR) images, and the other is an HR quality enhancement branch aiming at generating high-quality (HQ) text images based on the HR images to provide more accurate supervision to the LR images. As the degradation from HQ to HR may be diverse, and there is no pixel-level supervision for HQ image generation, we design a kernel-guided enhancement network to handle various degradation, and exploit the feedback from a recognizer and text-level annotations as weak supervision signal to train the HR enhancement branch. Then, a quality estimation module is employed to evaluate the qualities of HQ images, which are used to suppress the erroneous supervision information by weighting the loss of each image. Extensive experiments on TextZoom show that HiREN can work well with most existing STISR methods and significantly boost their performances.

updated: Mon Jul 31 2023 05:32:57 GMT+0000 (UTC)

published: Mon Jul 31 2023 05:32:57 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト