TextAdaIN: Paying Attention to Shortcut Learning in Text Recognizers

Oren Nuriel; Sharon Fogel; Ron Litman

TextAdaIN：テキスト認識機能のショートカット学習に注意を払う

畳み込み層の特性を活用するニューラルネットワークは、パターン認識タスクに非常に効果的です。ただし、場合によっては、それらの決定が意図しない情報に基づいて行われるため、標準ベンチマークで高いパフォーマンスが得られるだけでなく、困難なテスト条件や直感的でない失敗に対する一般化が欠如します。最近の研究では、これを「ショートカット学習」と呼び、複数のドメインでの存在に取り組んでいます。テキスト認識では、別のそのようなショートカットを明らかにします。これにより、認識機能はローカル画像統計に過度に依存します。これに動機付けられて、テキスト認識のパフォーマンスを向上させるローカル統計への依存を規制するアプローチを提案します。 TextAdaINと呼ばれるこの方法は、特徴マップに局所的な歪みを作成し、ネットワークが局所的な統計に過剰適合するのを防ぎます。これは、各特徴マップを要素のシーケンスとして表示し、ミニバッチ内の要素間で意図的に不一致のきめ細かい特徴統計を行うことによって行われます。 TextAdaINの単純さにもかかわらず、広範な実験は、他のより複雑な方法と比較してその有効性を示しています。 TextAdaINは、標準の手書きテキスト認識ベンチマークで最先端の結果を実現します。さらに、複数のアーキテクチャとシーンテキスト認識のドメインに一般化されます。さらに、TextAdaINを統合すると、より困難なテスト条件に対する堅牢性が向上することを示します。

Leveraging the characteristics of convolutional layers, neural networks are extremely effective for pattern recognition tasks. However in some cases, their decisions are based on unintended information leading to high performance on standard benchmarks but also to a lack of generalization to challenging testing conditions and unintuitive failures. Recent work has termed this "shortcut learning" and addressed its presence in multiple domains. In text recognition, we reveal another such shortcut, whereby recognizers overly depend on local image statistics. Motivated by this, we suggest an approach to regulate the reliance on local statistics that improves text recognition performance. Our method, termed TextAdaIN, creates local distortions in the feature map which prevent the network from overfitting to local statistics. It does so by viewing each feature map as a sequence of elements and deliberately mismatching fine-grained feature statistics between elements in a mini-batch. Despite TextAdaIN's simplicity, extensive experiments show its effectiveness compared to other, more complicated methods. TextAdaIN achieves state-of-the-art results on standard handwritten text recognition benchmarks. Additionally, it generalizes to multiple architectures and to the domain of scene text recognition. Furthermore, we demonstrate that integrating TextAdaIN improves robustness towards more challenging testing conditions.

updated: Wed Nov 17 2021 09:34:59 GMT+0000 (UTC)

published: Sun May 09 2021 10:47:48 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト