PIMNet: A Parallel, Iterative and Mimicking Network for Scene Text Recognition

Zhi Qiao; Yu Zhou; Jin Wei; Wei Wang; Yuan Zhang; Ning Jiang; Hongbin Wang; Weiping Wang

PIMNet：シーンテキスト認識のための並列で反復的な模倣ネットワーク

今日、シーンテキスト認識は、そのさまざまなアプリケーションのためにますます注目を集めています。ほとんどの最先端の方法は、左から右に自動回帰的にテキストを生成するアテンションメカニズムを備えたエンコーダ-デコーダフレームワークを採用しています。説得力のあるパフォーマンスにもかかわらず、1つずつデコードする戦略のために速度が制限されます。自己回帰モデルとは対照的に、非自己回帰モデルは、はるかに短い推論時間と並行して結果を予測しますが、精度は自己回帰モデルよりも大幅に遅れています。この論文では、精度と効率のバランスをとるために、並列、反復、模倣ネットワーク（PIMNet）を提案します。具体的には、PIMNetは、テキストをより高速に予測するための並列アテンションメカニズムと、予測をより正確にするための反復生成メカニズムを採用しています。各反復で、コンテキスト情報が完全に調査されます。隠れ層の学習を改善するために、トレーニングフェーズでの模倣学習を活用します。この場合、追加の自己回帰デコーダーが採用され、並列デコーダーは隠れ層の出力をフィッティングして自己回帰デコーダーを模倣します。 2つのデコーダー間でバックボーンを共有することで、提案されたPIMNetを事前トレーニングなしでエンドツーエンドでトレーニングできます。推論中に、自己回帰デコーダーのブランチが削除され、速度が向上します。公開ベンチマークに関する広範な実験により、PIMNetの有効性と効率が実証されています。私たちのコードはhttps://github.com/Pay20Y/PIMNetで入手できます。

Nowadays, scene text recognition has attracted more and more attention due to its various applications. Most state-of-the-art methods adopt an encoder-decoder framework with attention mechanism, which generates text autoregressively from left to right. Despite the convincing performance, the speed is limited because of the one-by-one decoding strategy. As opposed to autoregressive models, non-autoregressive models predict the results in parallel with a much shorter inference time, but the accuracy falls behind the autoregressive counterpart considerably. In this paper, we propose a Parallel, Iterative and Mimicking Network (PIMNet) to balance accuracy and efficiency. Specifically, PIMNet adopts a parallel attention mechanism to predict the text faster and an iterative generation mechanism to make the predictions more accurate. In each iteration, the context information is fully explored. To improve learning of the hidden layer, we exploit the mimicking learning in the training phase, where an additional autoregressive decoder is adopted and the parallel decoder mimics the autoregressive decoder with fitting outputs of the hidden layer. With the shared backbone between the two decoders, the proposed PIMNet can be trained end-to-end without pre-training. During inference, the branch of the autoregressive decoder is removed for a faster speed. Extensive experiments on public benchmarks demonstrate the effectiveness and efficiency of PIMNet. Our code will be available at https://github.com/Pay20Y/PIMNet.

updated: Thu Sep 09 2021 10:11:07 GMT+0000 (UTC)

published: Thu Sep 09 2021 10:11:07 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト