SoftCTC x2013 Semi-Supervised Learning for Text Recognition using Soft Pseudo-Labels

Martin Kišš; Michal Hradiš; Karel Beneš; Petr Buchal; Michal Kula

SoftCTC x2013 ソフト疑似ラベルを使用したテキスト認識のための半教師あり学習

このホワイトペーパーでは、光学式文字認識や自動音声認識などのシーケンスタスクの半教師付きトレーニングについて説明します。複数の転写バリアントを同時に考慮することを可能にする CTC の拡張である、新しい損失関数 x2013 SoftCTC x2013 を提案します。これにより、半教師あり学習への疑似ラベル付けアプローチの重要なコンポーネントである信頼度ベースのフィルタリングステップを省略できます。困難な手書き認識タスクに対するこの方法の有効性を実証し、SoftCTC が微調整されたフィルタリングベースのパイプラインのパフォーマンスと一致すると結論付けました。また、SoftCTC を計算効率の観点から評価し、複数のトランスクリプションバリアントをトレーニングする単純な CTC ベースのアプローチよりも大幅に効率的であると結論付け、GPU 実装を公開しました。

This paper explores semi-supervised training for sequence tasks, such as Optical Character Recognition or Automatic Speech Recognition. We propose a novel loss function x2013 SoftCTC x2013 which is an extension of CTC allowing to consider multiple transcription variants at the same time. This allows to omit the confidence based filtering step which is otherwise a crucial component of pseudo-labeling approaches to semi-supervised learning. We demonstrate the effectiveness of our method on a challenging handwriting recognition task and conclude that SoftCTC matches the performance of a finely-tuned filtering based pipeline. We also evaluated SoftCTC in terms of computational efficiency, concluding that it is significantly more efficient than a naïve CTC-based approach for training on multiple transcription variants, and we make our GPU implementation public.

updated: Mon Dec 05 2022 10:13:50 GMT+0000 (UTC)

published: Mon Dec 05 2022 10:13:50 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト