Pho(SC)-CTC -- A Hybrid Approach Towards Zero-shot Word Image Recognition

Ravi Bhatt; Anuj Rai; Narayanan C. Krishnan; Sukalpa Chanda

Pho（SC）-CTC-ゼロショット単語画像認識に向けたハイブリッドアプローチ

単語画像認識の目的で歴史的文書画像アーカイブ内の単語に注釈を付けるには、時間と熟練した人材（歴史家、古文書学者など）が必要です。実際のシナリオでは、考えられるすべての単語のサンプル画像を取得することもできません。ただし、ゼロショット学習方法は、そのような歴史的文書画像内の見えない/語彙外の単語を認識するために適切に使用できます。ゼロショット単語認識Pho（SC）Netのこれまでの最先端手法に基づいて、CTCフレームワーク（Pho（SC）-CTC）に基づくハイブリッドモデルを提案します。 Pho（SC）Netに続いて、コネクショニスト時系列分類（CTC）フレームワークを使用して、最終的な分類を実行します。有望な結果は、2つの公開されている歴史的文書データセットと1つの合成手書きデータセットで得られました。これは、Pho（SC）-CTCとPho（SC）Netの有効性を正当化するものです。

Annotating words in a historical document image archive for word image recognition purpose demands time and skilled human resource (like historians, paleographers). In a real-life scenario, obtaining sample images for all possible words is also not feasible. However, Zero-shot learning methods could aptly be used to recognize unseen/out-of-lexicon words in such historical document images. Based on previous state-of-the-art method for zero-shot word recognition Pho(SC)Net, we propose a hybrid model based on the CTC framework (Pho(SC)-CTC) that takes advantage of the rich features learned by Pho(SC)Net followed by a connectionist temporal classification (CTC) framework to perform the final classification. Encouraging results were obtained on two publicly available historical document datasets and one synthetic handwritten dataset, which justifies the efficacy of Pho(SC)-CTC and Pho(SC)Net.

updated: Wed Dec 21 2022 08:21:05 GMT+0000 (UTC)

published: Mon May 31 2021 16:22:33 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト