The Surprisingly Straightforward Scene Text Removal Method With Gated Attention and Region of Interest Generation: A Comprehensive Prominent Model Analysis

Hyeonsu Lee; Chankyu Choi

ゲーテッド・アテンションと関心領域生成による驚くほど単純なシーン・テキスト除去方法: 包括的なプロミネント・モデル分析

自然の風景画像からテキストを消去するタスクであるシーンテキスト削除 (STR) は、テキストの編集や、ID、電話番号、ナンバープレート番号などの個人情報の隠蔽の重要なコンポーネントとして最近注目されています。 STRにはさまざまな手法が活発に研究されていますが、これまでに提案された手法は、同じ標準化されたトレーニング/評価データセットを使用していないため、優位性を評価することは困難です。標準化された再実装後、同じ標準化されたトレーニング/テストデータセットを使用して、以前のいくつかの方法のパフォーマンスを評価します。また、このホワイトペーパーでは、シンプルでありながら非常に効果的な Gated Attention (GA) および Region-of-Interest Generation (RoIG) の方法論も紹介します。 GA は、入力画像からテキストをより正確に削除するために、周囲の領域のテクスチャと色だけでなく、テキストストロークにも注意を向けます。 RoIG を適用して、画像全体ではなくテキストのある領域のみに焦点を当て、モデルをより効率的にトレーニングします。ベンチマークデータセットの実験結果は、私たちの方法が既存の最先端の方法よりもはるかに優れた品質の結果を示し、ほとんどすべてのメトリックで優れていることを示しています。さらに、このモデルはテキストストロークマスクを明示的に生成しないため、追加の調整手順やサブモデルは必要なく、少ないパラメーターでモデルを非常に高速にできます。データセットとコードは、https://github.com/naver/garnet で入手できます。

Scene text removal (STR), a task of erasing text from natural scene images, has recently attracted attention as an important component of editing text or concealing private information such as ID, telephone, and license plate numbers. While there are a variety of different methods for STR actively being researched, it is difficult to evaluate superiority because previously proposed methods do not use the same standardized training/evaluation dataset. We use the same standardized training/testing dataset to evaluate the performance of several previous methods after standardized re-implementation. We also introduce a simple yet extremely effective Gated Attention (GA) and Region-of-Interest Generation (RoIG) methodology in this paper. GA uses attention to focus on the text stroke as well as the textures and colors of the surrounding regions to remove text from the input image much more precisely. RoIG is applied to focus on only the region with text instead of the entire image to train the model more efficiently. Experimental results on the benchmark dataset show that our method significantly outperforms existing state-of-the-art methods in almost all metrics with remarkably higher-quality results. Furthermore, because our model does not generate a text stroke mask explicitly, there is no need for additional refinement steps or sub-models, making our model extremely fast with fewer parameters. The dataset and code are available at this https://github.com/naver/garnet.

updated: Wed Nov 23 2022 12:39:12 GMT+0000 (UTC)

published: Fri Oct 14 2022 03:34:21 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト