Transfer Learning for Scene Text Recognition in Indian Languages

Sanjana Gunna; Rohit Saluja; C. V. Jawahar

インドの言語でのシーンテキスト認識のための転移学習

複数のスクリプト、フォント、テキストサイズ、向きなどの複雑さのため、リソースの少ないインドの言語でのシーンテキスト認識は困難です。この作業では、英語から2つの一般的なインドの言語までのディープシーンテキスト認識ネットワークのすべてのレイヤーの転移学習の力を調査します。一般化を確実にするために、従来のCRNNモデルとSTAR-Netで実験を行います。さまざまなスクリプトでの変更の影響を調べるために、最初にUnicodeフォントを使用してレンダリングされた合成単語画像で実験を実行します。英語モデルをインド言語の単純な合成データセットに転送することは実用的ではないことを示します。代わりに、n-gram分布と母音や結合文字などの視覚的特徴が類似しているため、インドの言語間で転移学習手法を適用することを提案します。次に、フォントと語長の統計がさまざまな複雑さを持つ6つのインド言語間の転移学習を研究します。また、他のインドの言語から転送されたモデルの学習された機能は、英語から転送されたものよりも視覚的に個々のモデルの機能に近い（場合によってはさらに優れている）ことも示しています。最後に、IIIT-ILSTのヒンディー語、テルグ語、マラヤーラム語のデータセットとMLT-17のバングラ語のデータセットでシーンテキスト認識の新しいベンチマークを設定し、単語認識率（WRR）を6％、5％、2％、23％向上させました。）前作と比較。モデルに新しい補正BiLSTMを接続することにより、MLT-17バングラの結果をさらに改善します。さらに、500のグジャラート語と2535のタミル語を含む約440のシーン画像のデータセットをリリースします。 WRRは、MLT-19のヒンディー語とベンガル語のデータセットおよびグジャラート語とタミル語のデータセットでベースラインを8％、4％、5％、3％上回っています。

Scene text recognition in low-resource Indian languages is challenging because of complexities like multiple scripts, fonts, text size, and orientations. In this work, we investigate the power of transfer learning for all the layers of deep scene text recognition networks from English to two common Indian languages. We perform experiments on the conventional CRNN model and STAR-Net to ensure generalisability. To study the effect of change in different scripts, we initially run our experiments on synthetic word images rendered using Unicode fonts. We show that the transfer of English models to simple synthetic datasets of Indian languages is not practical. Instead, we propose to apply transfer learning techniques among Indian languages due to similarity in their n-gram distributions and visual features like the vowels and conjunct characters. We then study the transfer learning among six Indian languages with varying complexities in fonts and word length statistics. We also demonstrate that the learned features of the models transferred from other Indian languages are visually closer (and sometimes even better) to the individual model features than those transferred from English. We finally set new benchmarks for scene-text recognition on Hindi, Telugu, and Malayalam datasets from IIIT-ILST and Bangla dataset from MLT-17 by achieving 6%, 5%, 2%, and 23% gains in Word Recognition Rates (WRRs) compared to previous works. We further improve the MLT-17 Bangla results by plugging in a novel correction BiLSTM into our model. We additionally release a dataset of around 440 scene images containing 500 Gujarati and 2535 Tamil words. WRRs improve over the baselines by 8%, 4%, 5%, and 3% on the MLT-19 Hindi and Bangla datasets and the Gujarati and Tamil datasets.

updated: Mon Jan 10 2022 06:14:49 GMT+0000 (UTC)

published: Mon Jan 10 2022 06:14:49 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト