1st Place Solution to ECCV 2022 Challenge on Out of Vocabulary Scene Text Understanding: Cropped Word Recognition

Zhangzi Zhu; Yu Hao; Wenqing Zhang; Chuhui Xue; Song Bai

ECCV 2022 チャレンジの第 1 位語彙シーン外のテキスト理解に関する課題: 切り取られた単語認識

このレポートは、語彙外シーンテキスト理解 (OOV-ST) : トリミングされた単語認識に関する ECCV 2022 チャレンジの勝者ソリューションを示しています。この課題は、テキストインエブリッシング (TiE) に関する ECCV 2022 ワークショップのコンテキストで開催されます。このワークショップは、自然なシーンの画像から語彙外の単語を抽出することを目的としています。コンテストでは、まず合成データセットで SCATTER を事前トレーニングし、次にデータ拡張を使用してトレーニングセットでモデルを微調整します。一方、2 つの追加モデルは、長い縦書きテキスト用に特別にトレーニングされています。最後に、さまざまなモデルからの出力を、さまざまなレイヤー、さまざまなバックボーン、およびさまざまなシードと組み合わせて、最終結果を作成します。私たちのソリューションは、語彙内の単語と語彙外の単語の両方を考慮した場合、69.73% の全体的な単語精度を達成します。

This report presents our winner solution to ECCV 2022 challenge on Out-of-Vocabulary Scene Text Understanding (OOV-ST) : Cropped Word Recognition. This challenge is held in the context of ECCV 2022 workshop on Text in Everything (TiE), which aims to extract out-of-vocabulary words from natural scene images. In the competition, we first pre-train SCATTER on the synthetic datasets, then fine-tune the model on the training set with data augmentations. Meanwhile, two additional models are trained specifically for long and vertical texts. Finally, we combine the output from different models with different layers, different backbones, and different seeds as the final results. Our solution achieves an overall word accuracy of 69.73% when considering both in-vocabulary and out-of-vocabulary words.

updated: Tue Aug 23 2022 06:51:25 GMT+0000 (UTC)

published: Thu Aug 04 2022 16:20:58 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト