Illegible Text to Readable Text: An Image-to-Image Transformation using   Conditional Sliced Wasserstein Adversarial Networks

Mostafa Karimi; Gopalkrishna Veni; Yen-Yun Yu

判読不能なテキストから読みやすいテキストへ：条件付きスライスWasserstein敵対ネットワークを使用した画像から画像への変換

Illegible Text to Readable Text: An Image-to-Image Transformation using Conditional Sliced Wasserstein Adversarial Networks

古代の手書きの記録画像からの自動テキスト認識は、系譜の分野で重要な問題です。ただし、ノイズ条件の変化、テキストの消失、手書きのバリエーションなどの重要な課題により、認識タスクが困難になります。手書き認識をテキスト画像からテキスト画像への翻訳問題として定式化する手書きから機械への印刷の条件付き生成的敵対ネットワーク（HW2MP-GAN）モデルを開発することにより、この問題に取り組みます。フォームは、マシン印刷フォームに近い別の画像に変換されます。提案されたモデルは、ジェネレーター、単語レベルおよび文字レベルの弁別器を含む3つのコンポーネントで構成されています。このモデルは、HW2MP-GANにスライスワッサーシュタイン距離（SWD）とU-Netアーキテクチャを組み込んで、画像から画像への高品質な変換を実現します。私たちの実験は、HW2MP-GANがフレッシュ手書き距離（FHD）でほぼ30、平均Levenshtein距離で0.6、IAMデータベースでの画像から画像への翻訳で39％の単語精度で最先端のベースラインcGANモデルよりも優れていることを明らかにします。さらに、HW2MP-GANは、IAMデータベースのベースライン手書き認識モデルと比較して、手書き認識単語の精度を1.3％向上させます。

Automatic text recognition from ancient handwritten record images is an important problem in the genealogy domain. However, critical challenges such as varying noise conditions, vanishing texts, and variations in handwriting make the recognition task difficult. We tackle this problem by developing a handwritten-to-machine-print conditional Generative Adversarial network (HW2MP-GAN) model that formulates handwritten recognition as a text-Image-to-text-Image translation problem where a given image, typically in an illegible form, is converted into another image, close to its machine-print form. The proposed model consists of three-components including a generator, and word-level and character-level discriminators. The model incorporates Sliced Wasserstein distance (SWD) and U-Net architectures in HW2MP-GAN for better quality image-to-image transformation. Our experiments reveal that HW2MP-GAN outperforms state-of-the-art baseline cGAN models by almost 30 in Frechet Handwritten Distance (FHD), 0.6 on average Levenshtein distance and 39% in word accuracy for image-to-image translation on IAM database. Further, HW2MP-GAN improves handwritten recognition word accuracy by 1.3% compared to baseline handwritten recognition models on the IAM database.

updated: Fri Oct 11 2019 22:01:24 GMT+0000 (UTC)

published: Fri Oct 11 2019 22:01:24 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト