Full Page Handwriting Recognition via Image to Sequence Extraction

Sumeet S. Singh; Sergey Karayev

画像からシーケンスへの抽出による全ページ手書き認識

画像セグメンテーションなしで手書きまたは印刷されたテキストの全ページを認識するようにトレーニングできるニューラルネットワークベースの手書きテキスト認識（HTR）モデルアーキテクチャを紹介します。 Image to Sequenceアーキテクチャに基づいているため、画像に存在するテキストを抽出し、テキストと非テキストの方向、レイアウト、サイズに関する制約を課すことなく、正しくシーケンスすることができます。さらに、フォーマット、レイアウト、コンテンツに関連する補助マークアップを生成するようにトレーニングすることもできます。私たちはキャラクターレベルの語彙を使用し、それによってあらゆる主題の言語と用語を可能にします。このモデルは、IAMデータセットで段落レベルの認識における新しい最先端を実現します。実世界の手書きの自由形式のテスト回答のスキャンで評価すると、曲線や傾斜した線、図面、表、数学、化学、その他の記号に悩まされ、市販されているすべてのHTRクラウドAPIよりも優れたパフォーマンスを発揮します。これは、商用Webアプリケーションの一部として本番環境にデプロイされます。

We present a Neural Network based Handwritten Text Recognition (HTR) model architecture that can be trained to recognize full pages of handwritten or printed text without image segmentation. Being based on Image to Sequence architecture, it can extract text present in an image and then sequence it correctly without imposing any constraints regarding orientation, layout and size of text and non-text. Further, it can also be trained to generate auxiliary markup related to formatting, layout and content. We use character level vocabulary, thereby enabling language and terminology of any subject. The model achieves a new state-of-art in paragraph level recognition on the IAM dataset. When evaluated on scans of real world handwritten free form test answers - beset with curved and slanted lines, drawings, tables, math, chemistry and other symbols - it performs better than all commercially available HTR cloud APIs. It is deployed in production as part of a commercial web application.

updated: Sun Jun 26 2022 21:01:23 GMT+0000 (UTC)

published: Thu Mar 11 2021 04:37:29 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト