The Effects of Regularization and Data Augmentation are Class Dependent

Randall Balestriero; Leon Bottou; Yann LeCun

正則化とデータ拡張の効果はクラスに依存します

正則化は、モデルの複雑さを制限することにより、過剰適合を防ぎ、一般化のパフォーマンスを向上させるための基本的な手法です。現在のDeepNetworksは、データ拡張（DA）や重み減衰などの正則化に大きく依存しており、構造的リスクの最小化、つまり交差検定を使用して、最適な正則化ハイパーパラメーターを選択します。この研究では、DAや重みの減衰などの手法により、クラス間で不公平な複雑さが軽減されたモデルが生成されることを示します。相互検証から見つかったDAまたは重量減衰の最適量は、一部のクラスで悲惨なモデルパフォーマンスにつながります。たとえば、resnet50を使用するImagenetでは、「バーンスパイダー」分類テストの精度は、ランダムクロップDAを導入するだけで68％から46％に低下します。トレーニング。さらに驚くべきことに、このようなパフォーマンスの低下は、重みの減衰などの情報量の少ない正則化手法を導入した場合にも発生します。これらの結果は、すべてのクラスとサンプルで平均化された、これまでになく増加する一般化パフォーマンスの検索により、一部のクラスのパフォーマンスを黙って犠牲にするモデルとレギュラライザーが残っていることを示しています。このシナリオは、ダウンストリームタスクにモデルをデプロイするときに危険になる可能性があります。たとえば、INaturalistにデプロイされたImagenetの事前トレーニング済みresnet50は、Imagenet事前トレーニングフェーズ中にランダムクロップDAを導入すると、クラス\＃8889でパフォーマンスが70％から30％に低下します。これらの結果は、クラスに依存するバイアスのない新しいレギュラライザーの設計が未解決の研究課題であることを示しています。

Regularization is a fundamental technique to prevent over-fitting and to improve generalization performances by constraining a model's complexity. Current Deep Networks heavily rely on regularizers such as Data-Augmentation (DA) or weight-decay, and employ structural risk minimization, i.e. cross-validation, to select the optimal regularization hyper-parameters. In this study, we demonstrate that techniques such as DA or weight decay produce a model with a reduced complexity that is unfair across classes. The optimal amount of DA or weight decay found from cross-validation leads to disastrous model performances on some classes e.g. on Imagenet with a resnet50, the "barn spider" classification test accuracy falls from 68% to 46% only by introducing random crop DA during training. Even more surprising, such performance drop also appears when introducing uninformative regularization techniques such as weight decay. Those results demonstrate that our search for ever increasing generalization performance -- averaged over all classes and samples -- has left us with models and regularizers that silently sacrifice performances on some classes. This scenario can become dangerous when deploying a model on downstream tasks e.g. an Imagenet pre-trained resnet50 deployed on INaturalist sees its performances fall from 70% to 30% on class \#8889 when introducing random crop DA during the Imagenet pre-training phase. Those results demonstrate that designing novel regularizers without class-dependent bias remains an open research question.

updated: Fri Apr 08 2022 20:03:26 GMT+0000 (UTC)

published: Thu Apr 07 2022 17:57:29 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト