Towards the Unseen: Iterative Text Recognition by Distilling from Errors

Ayan Kumar Bhunia; Pinaki Nath Chowdhury; Aneeshan Sain; Yi-Zhe Song

見えないものに向けて：エラーから抽出することによる反復テキスト認識

視覚的テキスト認識は、間違いなくコンピュータビジョンで最も広く研究されているトピックの1つです。これまでに大きな進歩が見られ、最新のモデルはより実用的な「インザワイルド」設定に焦点を合わせ始めています。ただし、顕著な問題は依然として実際の展開を妨げます。先行技術は、ほとんどの場合、見えない（またはめったに見られない）文字シーケンスの認識に苦労しています。この論文では、この「見えない」問題に具体的に取り組むための新しいフレームワークを提案します。私たちのフレームワークは、前の反復からの文字シーケンスの予測された知識を利用して、次の予測を改善する際にメインネットワークを強化するという点で、本質的に反復的です。私たちの成功の鍵は、フィードバックモジュールとして機能する独自のクロスモーダル変分オートエンコーダーです。これは、テキストエラー分布データの存在によってトレーニングされます。このモジュールは、離散予測文字空間を、次の反復で視覚的特徴マップを調整するために使用される連続アフィン変換パラメーター空間に重要に変換します。一般的なデータセットでの実験は、従来の設定の下で最先端のものに対して競争力のあるパフォーマンスを示しました。最も重要なことは、トレインテストラベルが相互に排他的である新しい互いに素なセットアップの下で、私たちのラベルが最高のパフォーマンスを提供し、見えない単語に一般化する機能を示していることです。

Visual text recognition is undoubtedly one of the most extensively researched topics in computer vision. Great progress have been made to date, with the latest models starting to focus on the more practical "in-the-wild" setting. However, a salient problem still hinders practical deployment -- prior arts mostly struggle with recognising unseen (or rarely seen) character sequences. In this paper, we put forward a novel framework to specifically tackle this "unseen" problem. Our framework is iterative in nature, in that it utilises predicted knowledge of character sequences from a previous iteration, to augment the main network in improving the next prediction. Key to our success is a unique cross-modal variational autoencoder to act as a feedback module, which is trained with the presence of textual error distribution data. This module importantly translate a discrete predicted character space, to a continuous affine transformation parameter space used to condition the visual feature map at next iteration. Experiments on common datasets have shown competitive performance over state-of-the-arts under the conventional setting. Most importantly, under the new disjoint setup where train-test labels are mutually exclusive, ours offers the best performance thus showcasing the capability of generalising onto unseen words.

updated: Mon Jul 26 2021 10:06:42 GMT+0000 (UTC)

published: Mon Jul 26 2021 10:06:42 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト