Runner-Up Solution to ECCV 2022 Challenge on Out of Vocabulary Scene Text Understanding: Cropped Word Recognition

Zhangzi Zhu; Yu Hao; Wenqing Zhang; Chuhui Xue; Song Bai

ECCV 2022 チャレンジの次点のソリューション: 語彙シーン外のテキスト理解: トリミングされた単語認識

このレポートは、語彙外シーンテキスト理解 (OOV-ST) に関する ECCV 2022 課題に対する当社の第 2 位のソリューションを示します: トリミングされた単語認識。この課題は、テキストインエブリッシング (TiE) に関する ECCV 2022 ワークショップのコンテキストで開催されます。このワークショップは、自然なシーンの画像から語彙外の単語を抽出することを目的としています。コンテストでは、まず合成データセットで SCATTER を事前トレーニングし、次にデータ拡張を使用してトレーニングセットでモデルを微調整します。一方、2 つの追加モデルは、長い縦書きテキスト用に特別にトレーニングされています。最後に、さまざまなモデルからの出力を、さまざまなレイヤー、さまざまなバックボーン、およびさまざまなシードと組み合わせて、最終結果を作成します。私たちのソリューションは、語彙外の単語のみを考慮した場合、59.45% の単語精度を達成します。

This report presents our 2nd place solution to ECCV 2022 challenge on Out-of-Vocabulary Scene Text Understanding (OOV-ST) : Cropped Word Recognition. This challenge is held in the context of ECCV 2022 workshop on Text in Everything (TiE), which aims to extract out-of-vocabulary words from natural scene images. In the competition, we first pre-train SCATTER on the synthetic datasets, then fine-tune the model on the training set with data augmentations. Meanwhile, two additional models are trained specifically for long and vertical texts. Finally, we combine the output from different models with different layers, different backbones, and different seeds as the final results. Our solution achieves a word accuracy of 59.45% when considering out-of-vocabulary words only.

updated: Wed Aug 31 2022 13:00:42 GMT+0000 (UTC)

published: Thu Aug 04 2022 16:20:58 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト