Self-distillation Regularized Connectionist Temporal Classification Loss for Text Recognition: A Simple Yet Effective Approach

Ziyin Zhang; Ning Lu; Minghui Liao; Yongshuai Huang; Cheng Li; Min Wang; Wei Peng

テキスト認識のための自己蒸留正則化コネクショニスト時間分類損失: シンプルかつ効果的なアプローチ

テキスト認識方法は急速に発展しています。強力なモジュール、言語モデル、非教師あり学習スキーム、半教師あり学習スキームなどの一部の高度な技術は、公開ベンチマークのパフォーマンスを継続的に向上させます。ただし、損失関数の観点からテキスト認識モデルをより適切に最適化する方法の問題は、ほとんど見落とされています。 CTC ベースの手法は、パフォーマンスと推論速度のバランスが優れているため、実際に広く使用されていますが、依然として精度の低下に悩まされています。これは、CTC 損失がシーケンスターゲット全体の最適化を重視し、個々の文字の学習を無視するためです。この問題に対処するために、CTC ベースのモデルの自己蒸留スキームを提案します。 CTC 損失にフレーム単位の正則化項を組み込んで個別の監視を強調し、潜在アライメントの事後最大化を利用して CTC ベースのモデル間の蒸留で生じる不一致の問題を解決します。正規化された CTC 損失を蒸留コネクショニスト時間分類 (DCTC) 損失と呼びます。 DCTC 損失にはモジュールが不要で、追加のパラメーター、長い推論ラグ、追加のトレーニングデータやフェーズは必要ありません。公開ベンチマークでの広範な実験により、DCTC がこれらの欠点なしにテキスト認識モデルの精度を最大 2.6% 向上できることが実証されました。

Text recognition methods are gaining rapid development. Some advanced techniques, e.g., powerful modules, language models, and un- and semi-supervised learning schemes, consecutively push the performance on public benchmarks forward. However, the problem of how to better optimize a text recognition model from the perspective of loss functions is largely overlooked. CTC-based methods, widely used in practice due to their good balance between performance and inference speed, still grapple with accuracy degradation. This is because CTC loss emphasizes the optimization of the entire sequence target while neglecting to learn individual characters. We propose a self-distillation scheme for CTC-based model to address this issue. It incorporates a framewise regularization term in CTC loss to emphasize individual supervision, and leverages the maximizing-a-posteriori of latent alignment to solve the inconsistency problem that arises in distillation between CTC-based models. We refer to the regularized CTC loss as Distillation Connectionist Temporal Classification (DCTC) loss. DCTC loss is module-free, requiring no extra parameters, longer inference lag, or additional training data or phases. Extensive experiments on public benchmarks demonstrate that DCTC can boost text recognition model accuracy by up to 2.6%, without any of these drawbacks.

updated: Fri Dec 29 2023 11:06:45 GMT+0000 (UTC)

published: Thu Aug 17 2023 06:32:57 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト