Data augmentation instead of explicit regularization

Alex Hernández-García; Peter König

明示的な正則化の代わりにデータの拡張

ほとんどの機械学習モデルとは異なり、最新の深層人工ニューラルネットワークには通常、正則化に寄与する複数のコンポーネントが含まれています。重みの減衰やドロップアウトなどの一部の（明示的な）正則化手法では、敏感なハイパーパラメーターのコストのかかる微調整が必要ですが、それらと暗黙の正則化を提供する他の要素との相互作用はまだよく理解されていません。これらの相互作用に光を当てることは、計算リソースを効率的に使用するための鍵であり、深層学習における一般化のパズルの解決に貢献する可能性があります。ここでは、最初に、テクニック間の本質的な違いを理解するのに役立つ明示的および暗黙的な正則化の正式な定義を提供します。次に、データの増加と重みの減衰およびドロップアウトを対比します。私たちの結果は、データ拡張のみでトレーニングされた視覚オブジェクト分類モデルが、一般的な方法であるように、重みの減衰とドロップアウトでもトレーニングされたモデルと同じかそれ以上のパフォーマンスを達成することを示しています。重みの減衰とドロップアウトの一般化への寄与は、十分な暗黙の正則化が提供されている場合は不要であるだけでなく、ハイパーパラメータがアーキテクチャとデータセットに対して注意深く調整されていない場合、そのような手法はパフォーマンスを劇的に低下させる可能性があると結論付けます。対照的に、データ拡張は体系的に大きな一般化ゲインを提供し、ハイパーパラメータの再調整を必要としません。私たちの結果を考慮して、計算リソース、したがって炭素排出量を節約するために、重みの減衰とドロップアウトなしでニューラルネットワークを最適化し、パフォーマンスと堅牢性を向上させるためにデータ拡張やその他の誘導バイアスにさらに焦点を当てることをお勧めします。

Contrary to most machine learning models, modern deep artificial neural networks typically include multiple components that contribute to regularization. Despite the fact that some (explicit) regularization techniques, such as weight decay and dropout, require costly fine-tuning of sensitive hyperparameters, the interplay between them and other elements that provide implicit regularization is not well understood yet. Shedding light upon these interactions is key to efficiently using computational resources and may contribute to solving the puzzle of generalization in deep learning. Here, we first provide formal definitions of explicit and implicit regularization that help understand essential differences between techniques. Second, we contrast data augmentation with weight decay and dropout. Our results show that visual object categorization models trained with data augmentation alone achieve the same performance or higher than models trained also with weight decay and dropout, as is common practice. We conclude that the contribution on generalization of weight decay and dropout is not only superfluous when sufficient implicit regularization is provided, but also such techniques can dramatically deteriorate the performance if the hyperparameters are not carefully tuned for the architecture and data set. In contrast, data augmentation systematically provides large generalization gains and does not require hyperparameter re-tuning. In view of our results, we suggest to optimize neural networks without weight decay and dropout to save computational resources, hence carbon emissions, and focus more on data augmentation and other inductive biases to improve performance and robustness.

updated: Thu Nov 12 2020 10:35:59 GMT+0000 (UTC)

published: Mon Jun 11 2018 08:10:18 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト