End-to-End Page-Level Assessment of Handwritten Text Recognition

Enrique Vidal; Alejandro H. Toselli; Antonio Ríos-Vila; Jorge Calvo-Zaragoza

手書きテキスト認識のエンドツーエンドのページレベル評価

従来、手書きテキスト認識 (HTR) システムの評価では、文字レベルと単語レベルの両方で、HTR とグラウンドトゥルース (GT) トランスクリプトの間の編集距離に基づくメトリックが使用されてきました。 GT と HTR の両方のテキスト行が同じであると実験プロトコルが想定している場合、これは非常に適切です。これにより、編集距離を特定の行ごとに個別に計算できます。最近のパターン認識の進歩に後押しされて、HTR システムはドキュメントのエンドツーエンドのページレベルの転記にますます直面するようになりました。そこでは、さまざまなテキスト行とそれに対応する読み上げ順序 (RO) を特定する精度が重要な役割を果たします。このような場合、標準メトリックは、発生する可能性のある不整合を考慮していません。この論文では、ページレベルでHTRシステムを評価する問題が詳細に紹介されています。転記精度と RO の良さを別々に考慮する 2 つの評価を使用することの利便性を分析します。部分的にシミュレートされたものと実際の完全なエンドツーエンドの実験の両方を通じて、さまざまな代替案が提案され、分析され、経験的に比較されます。結果は、提案された二重評価アプローチの有効性を裏付けています。重要な結論は、このような評価は、2 つの単純でよく知られている測定基準によって適切に達成できるということです。文字起こしの順序性を考慮した Word Error Rate と、ここで再定式化された Bag of Words Word Error Rate (順序を無視したもの) です。 .後者は固有の単語認識エラーを直接かつ非常に正確に評価しますが、両方のメトリックの違いは、レイアウト分析の欠陥に関連する RO エラーを明示的に測定するメトリックであるスピアマンズフットルール距離と適切に相関します。

The evaluation of Handwritten Text Recognition (HTR) systems has traditionally used metrics based on the edit distance between HTR and ground truth (GT) transcripts, at both the character and word levels. This is very adequate when the experimental protocol assumes that both GT and HTR text lines are the same, which allows edit distances to be independently computed to each given line. Driven by recent advances in pattern recognition, HTR systems increasingly face the end-to-end page-level transcription of a document, where the precision of locating the different text lines and their corresponding reading order (RO) play a key role. In such a case, the standard metrics do not take into account the inconsistencies that might appear. In this paper, the problem of evaluating HTR systems at the page level is introduced in detail. We analyze the convenience of using a two-fold evaluation, where the transcription accuracy and the RO goodness are considered separately. Different alternatives are proposed, analyzed and empirically compared both through partially simulated and through real, full end-to-end experiments. Results support the validity of the proposed two-fold evaluation approach. An important conclusion is that such an evaluation can be adequately achieved by just two simple and well-known metrics: the Word Error Rate, that takes transcription sequentiality into account, and the here re-formulated Bag of Words Word Error Rate, that ignores order. While the latter directly and very accurately assess intrinsic word recognition errors, the difference between both metrics gracefully correlates with the Spearman's Foot Rule Distance, a metric which explicitly measures RO errors associated with layout analysis flaws.

updated: Sat Jan 14 2023 15:43:07 GMT+0000 (UTC)

published: Sat Jan 14 2023 15:43:07 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト