Task Grouping for Multilingual Text Recognition

Jing Huang; Kevin J Liang; Rama Kovvuri; Tal Hassner

多言語テキスト認識のためのタスクのグループ化

既存の OCR メソッドのほとんどは、英語と数字、およびそれらに対応するデータセットが人気であるため、英数字に焦点を当てています。文字をより多くの言語に拡張することに関して、最近の方法では、異なる認識ヘッドで異なるスクリプトをトレーニングすると、同じ認識ヘッドですべての言語の文字を組み合わせる場合と比較して、エンドツーエンドの認識精度が大幅に向上することが示されています。ただし、いくつかの言語間の類似性により、モデルパラメーターの共有が可能になり、共同トレーニングの恩恵を受けることができると仮定しています。ただし、言語グループの決定はすぐにはわかりません。この目的のために、Gumbel-Softmax を使用したタスクグループ化および割り当てモジュールによる多言語テキスト認識の自動方法を提案し、モデルとグループ化モジュールの同時トレーニングを可能にするタスクグループ化損失と加重認識損失を導入します。 MLT19 の実験は、すべてのタスクを一緒に結合することと、タスクのグループ化/分離のより良い構成を実現するすべてのタスクを分離することの間に中間点があるという私たちの仮説に証拠を与えます。

Most existing OCR methods focus on alphanumeric characters due to the popularity of English and numbers, as well as their corresponding datasets. On extending the characters to more languages, recent methods have shown that training different scripts with different recognition heads can greatly improve the end-to-end recognition accuracy compared to combining characters from all languages in the same recognition head. However, we postulate that similarities between some languages could allow sharing of model parameters and benefit from joint training. Determining language groupings, however, is not immediately obvious. To this end, we propose an automatic method for multilingual text recognition with a task grouping and assignment module using Gumbel-Softmax, introducing a task grouping loss and weighted recognition loss to allow for simultaneous training of the models and grouping modules. Experiments on MLT19 lend evidence to our hypothesis that there is a middle ground between combining every task together and separating every task that achieves a better configuration of task grouping/separation.

updated: Thu Oct 13 2022 23:54:23 GMT+0000 (UTC)

published: Thu Oct 13 2022 23:54:23 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト