Writer-Aware CNN for Parsimonious HMM-Based Offline Handwritten Chinese   Text Recognition

Zi-Rui Wang; Jun Du; Jia-Ming Wang

Par約的なHMMベースのオフライン手書き中国語テキスト認識のためのライター認識CNN

Writer-Aware CNN for Parsimonious HMM-Based Offline Handwritten Chinese Text Recognition

最近、ハイブリッド畳み込みニューラルネットワークの隠れマルコフモデル（CNN-HMM）がオフライン手書き中国語テキスト認識（HCTR）に導入され、最先端のパフォーマンスを達成しました。ただし、一定の固定数の隠された状態で漢字の大きな語彙のそれぞれをモデリングするには、高いメモリと計算コストが必要であり、何万ものHMM状態クラスを混乱させます。 HCTR向けのCNN-HMMのもう1つの重要な問題は、特定のライターのモデルの緊張と大幅なパフォーマンス低下につながる多様なライティングスタイルです。これらの問題に対処するために、私たちはpar約的なHMM（WCNN-PHMM）に基づいた作家を意識したCNNを提案します。まず、PHMMはデータ駆動型状態結合アルゴリズムを使用して設計され、HMM状態の総数を大幅に削減します。これにより、異なる漢字間で同じまたは類似のラジカルを状態共有することでコンパクトなCNNが得られるだけでなく、認識精度も向上します結び付けられた状態のより正確なモデリングと、それらの間の混乱が少ないためです。第二に、WCNNは、各畳み込み層を、作家依存ベクトル、つまり作家コードによって供給される1つの適応層と統合し、作家情報の無関係な変動性を抽出して認識パフォーマンスを向上させます。ライター適応レイヤーのパラメーターは、トレーニング段階で他のネットワークパラメーターと共同で最適化され、ライターパスを学習して認識結果を生成するためにマルチパスデコード戦略が採用されます。 CASIA-HWDBデータベースのICDAR 2013コンペティションで検証された7360クラスの語彙のよりコンパクトなWCNN-PHMMは、言語モデリングを考慮せずに、従来のCNN-HMMよりも16.6％の相対文字エラー率（CER）を削減できます。強力なハイブリッド言語モデル（N-gram言語モデルとリカレントニューラルネットワーク言語モデル）を採用することにより、WCNN-PHMMのCERは3.17％に削減されました。

Recently, the hybrid convolutional neural network hidden Markov model (CNN-HMM) has been introduced for offline handwritten Chinese text recognition (HCTR) and has achieved state-of-the-art performance. However, modeling each of the large vocabulary of Chinese characters with a uniform and fixed number of hidden states requires high memory and computational costs and makes the tens of thousands of HMM state classes confusing. Another key issue of CNN-HMM for HCTR is the diversified writing style, which leads to model strain and a significant performance decline for specific writers. To address these issues, we propose a writer-aware CNN based on parsimonious HMM (WCNN-PHMM). First, PHMM is designed using a data-driven state-tying algorithm to greatly reduce the total number of HMM states, which not only yields a compact CNN by state sharing of the same or similar radicals among different Chinese characters but also improves the recognition accuracy due to the more accurate modeling of tied states and the lower confusion among them. Second, WCNN integrates each convolutional layer with one adaptive layer fed by a writer-dependent vector, namely, the writer code, to extract the irrelevant variability in writer information to improve recognition performance. The parameters of writer-adaptive layers are jointly optimized with other network parameters in the training stage, while a multiple-pass decoding strategy is adopted to learn the writer code and generate recognition results. Validated on the ICDAR 2013 competition of CASIA-HWDB database, the more compact WCNN-PHMM of a 7360-class vocabulary can achieve a relative character error rate (CER) reduction of 16.6% over the conventional CNN-HMM without considering language modeling. By adopting a powerful hybrid language model (N-gram language model and recurrent neural network language model), the CER of WCNN-PHMM is reduced to 3.17%.

updated: Fri Sep 20 2019 02:50:28 GMT+0000 (UTC)

published: Mon Dec 24 2018 02:38:08 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト