Distribution Mismatch Correction for Improved Robustness in Deep Neural Networks

Alexander Fuchs; Christian Knoll; Franz Pernkopf

ディープニューラルネットワークのロバスト性を改善するための分布不一致補正

ディープニューラルネットワークは、パフォーマンスと学習動作を改善するために正規化手法に大きく依存しています。正規化手法は、ますます深く効率的なアーキテクチャの開発に拍車をかけましたが、ノイズや入力の破損に関する脆弱性も増大させます。ただし、ほとんどのアプリケーションでは、ノイズはいたるところにあり、多様です。これは、トレーニング時とテスト時の入力分布の不一致に対処できないため、機械学習システムの完全な障害につながることがよくあります。最も一般的な正規化方法であるバッチ正規化は、トレーニング中の分布シフトを減らしますが、テスト時間中の入力分布の変化には依存しません。これにより、テスト時にノイズが存在する場合は常に、バッチの正規化によってパフォーマンスが低下する傾向があります。サンプルベースの正規化方法は、活性化分布の線形変換を修正できますが、分布形状の変化を軽減することはできません。これにより、ネットワークは、正規化パラメーターに反映できない分散の変更に対して脆弱になります。各層の活性化分布を適応させる教師なしノンパラメトリック分布補正法を提案します。これにより、1-Dワッサースタイン距離が最小化されるため、トレーニングとテスト時間の分布の不一致が減少します。私たちの実験では、提案された方法が激しい画像破損の影響を効果的に低減し、モデルを再トレーニングまたは微調整する必要なしに分類パフォーマンスを改善することを経験的に示しています。

Deep neural networks rely heavily on normalization methods to improve their performance and learning behavior. Although normalization methods spurred the development of increasingly deep and efficient architectures, they also increase the vulnerability with respect to noise and input corruptions. In most applications, however, noise is ubiquitous and diverse; this can often lead to complete failure of machine learning systems as they fail to cope with mismatches between the input distribution during training- and test-time. The most common normalization method, batch normalization, reduces the distribution shift during training but is agnostic to changes in the input distribution during test time. This makes batch normalization prone to performance degradation whenever noise is present during test-time. Sample-based normalization methods can correct linear transformations of the activation distribution but cannot mitigate changes in the distribution shape; this makes the network vulnerable to distribution changes that cannot be reflected in the normalization parameters. We propose an unsupervised non-parametric distribution correction method that adapts the activation distribution of each layer. This reduces the mismatch between the training and test-time distribution by minimizing the 1-D Wasserstein distance. In our experiments, we empirically show that the proposed method effectively reduces the impact of intense image corruptions and thus improves the classification performance without the need for retraining or fine-tuning the model.

updated: Tue Oct 05 2021 11:36:25 GMT+0000 (UTC)

published: Tue Oct 05 2021 11:36:25 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト