Towards Robust Handwritten Text Recognition with On-the-fly User Participation

Ajoy Mondal; Rohit saluja; C. V. Jawahar

オンザフライのユーザー参加による堅牢な手書きテキスト認識に向けて

長期的な OCR サービスは、競争力のあるコストで高品質の出力をユーザーに提供することを目的としています。ユーザーがロードするデータは複雑であるため、モデルのアップグレードが不可欠です。サービスプロバイダーは、データの複雑さ、読みやすさ、および利用可能な予算に基づいて報酬を与えることで、OCR モデルが失敗したデータを提供するユーザーを奨励します。これまでの OCR 作業には、エンドユーザーを考慮せずに標準データセットでモデルを準備することが含まれていました。 15 人のユーザーのデータセットで、既存の手書きヒンディー語 OCR モデルを 3 回一貫してアップグレードする戦略を提案します。反復ごとに 4 ユーザーの予算を固定します。最初の反復では、モデルは最初の 4 人のユーザーからのデータセットで直接トレーニングします。残りの反復では、残りのすべてのユーザーがそれぞれページを作成し、サービスプロバイダーは後でそれを分析して、人間が読める単語の予測の質に基づいて 4 人の (新しい) 最良のユーザーを選択します。選択されたユーザーは、モデルをアップグレードするためにさらに 23 ページを書き込みます。現在の反復で利用可能なデータでカリキュラム学習 (CL) を使用してモデルをアップグレードし、以前の反復からのサブセットを比較します。アップグレードされたモデルは、23 人のユーザー全員からそれぞれ 1 ページずつの差し出されたセットでテストされます。 CL、ユーザーの選択、特に目に見えない文体からのデータの影響に関する調査への洞察を提供します。私たちの仕事は、サービスプロバイダーとエンドユーザー向けのクラウドソーシングシナリオでの長期的な OCR サービスに使用できます。

Long-term OCR services aim to provide high-quality output to their users at competitive costs. It is essential to upgrade the models because of the complex data loaded by the users. The service providers encourage the users who provide data where the OCR model fails by rewarding them based on data complexity, readability, and available budget. Hitherto, the OCR works include preparing the models on standard datasets without considering the end-users. We propose a strategy of consistently upgrading an existing Handwritten Hindi OCR model three times on the dataset of 15 users. We fix the budget of 4 users for each iteration. For the first iteration, the model directly trains on the dataset from the first four users. For the rest iteration, all remaining users write a page each, which service providers later analyze to select the 4 (new) best users based on the quality of predictions on the human-readable words. Selected users write 23 more pages for upgrading the model. We upgrade the model with Curriculum Learning (CL) on the data available in the current iteration and compare the subset from previous iterations. The upgraded model is tested on a held-out set of one page each from all 23 users. We provide insights into our investigations on the effect of CL, user selection, and especially the data from unseen writing styles. Our work can be used for long-term OCR services in crowd-sourcing scenarios for the service providers and end users.

updated: Sat Dec 17 2022 10:20:39 GMT+0000 (UTC)

published: Sat Dec 17 2022 10:20:39 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト