MetaHTR: Towards Writer-Adaptive Handwritten Text Recognition

Ayan Kumar Bhunia; Shuvozit Ghose; Amandeep Kumar; Pinaki Nath Chowdhury; Aneeshan Sain; Yi-Zhe Song

MetaHTR：ライターに向けて-適応型手書きテキスト認識

手書きテキスト認識（HTR）は、主に私たちの間に存在するさまざまな文体のために、これまでのところ困難な問題のままです。ただし、以前の作品は一般に、スタイルの数が限られていることを前提として動作し、そのほとんどは既存のデータセットによってすでにキャプチャされています。このホワイトペーパーでは、まったく異なる視点を取ります。大幅に異なる新しいスタイルが常に存在し、適応を実行するためのテスト中のデータは非常に限られているという前提で作業します。これにより、商業的に実行可能なソリューションが得られます。モデルは、新しいスタイルにさらされる適応で最高のショットを持ち、サンプルの数が少ないため、実装が実用的です。これは、サポートセットを介して追加の新しいライターデータを活用し、単一の勾配ステップ更新を介してライターに適合したモデルを出力する新しいメタ学習フレームワークを介して、すべて推論中に実現されます。私たちは、ライターごとに比較的大きなスタイルの不一致を示すキーキャラクターがほとんど存在しないという重要な洞察を発見し、活用します。そのために、テキストデータのシーケンシャルな性質で機能するように特別に設計された、文字ごとのクロスエントロピー損失のインスタンス固有の重みをメタ学習することをさらに提案します。ライターに適応するMetaHTRフレームワークは、ほとんどの最先端のHTRモデルの上に簡単に実装できます。実験によると、新しいスタイルのデータをほとんど観察しないと、平均5〜7％のパフォーマンス向上が得られます。さらに、一連のアブレーション研究を通じて、代替の適応メカニズムと比較した場合のメタデザインの利点を示します。

Handwritten Text Recognition (HTR) remains a challenging problem to date, largely due to the varying writing styles that exist amongst us. Prior works however generally operate with the assumption that there is a limited number of styles, most of which have already been captured by existing datasets. In this paper, we take a completely different perspective -- we work on the assumption that there is always a new style that is drastically different, and that we will only have very limited data during testing to perform adaptation. This results in a commercially viable solution -- the model has the best shot at adaptation being exposed to the new style, and the few samples nature makes it practical to implement. We achieve this via a novel meta-learning framework which exploits additional new-writer data through a support set, and outputs a writer-adapted model via single gradient step update, all during inference. We discover and leverage on the important insight that there exists few key characters per writer that exhibit relatively larger style discrepancies. For that, we additionally propose to meta-learn instance specific weights for a character-wise cross-entropy loss, which is specifically designed to work with the sequential nature of text data. Our writer-adaptive MetaHTR framework can be easily implemented on the top of most state-of-the-art HTR models. Experiments show an average performance gain of 5-7% can be obtained by observing very few new style data. We further demonstrate via a set of ablative studies the advantage of our meta design when compared with alternative adaption mechanisms.

updated: Mon Apr 05 2021 12:35:39 GMT+0000 (UTC)

published: Mon Apr 05 2021 12:35:39 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト