Dropout Reduces Underfitting

Zhuang Liu; Zhiqiu Xu; Joseph Jin; Zhiqiang Shen; Trevor Darrell

ドロップアウトはアンダーフィッティングを減らします

ヒントンらによって導入されました。 2012 年、dropout は、ニューラルネットワークの過剰適合を防止するための正則化ツールとして、時の試練に耐えてきました。この研究では、トレーニングの開始時にドロップアウトを使用すると、アンダーフィッティングも軽減できることを示しています。初期段階では、ドロップアウトがミニバッチ全体の勾配の方向の分散を減らし、ミニバッチの勾配をデータセット全体の勾配に合わせるのに役立つことがわかりました。これは、SGD の確率論を打ち消し、モデルトレーニングに対する個々のバッチの影響を制限するのに役立ちます。私たちの調査結果は、アンダーフィッティングモデルのパフォーマンスを改善するための解決策につながります - アーリードロップアウト: ドロップアウトはトレーニングの初期段階でのみ適用され、その後オフになります。初期ドロップアウトを備えたモデルは、ドロップアウトのない対応するモデルと比較して、最終的なトレーニング損失が低くなります。さらに、オーバーフィッティングモデルを正則化するための対称的な手法であるレイトドロップアウトを検討します。ドロップアウトは初期の反復では使用されず、トレーニングの後半でのみアクティブ化されます。 ImageNet とさまざまなビジョンタスクの実験は、私たちの方法が一般化の精度を一貫して改善することを示しています。私たちの結果は、深層学習における正則化を理解するためのより多くの研究を奨励し、私たちの方法は、特に大規模データの時代において、将来のニューラルネットワークトレーニングのための有用なツールになる可能性があります。コードは https://github.com/facebookresearch/dropout で入手できます。

Introduced by Hinton et al. in 2012, dropout has stood the test of time as a regularizer for preventing overfitting in neural networks. In this study, we demonstrate that dropout can also mitigate underfitting when used at the start of training. During the early phase, we find dropout reduces the directional variance of gradients across mini-batches and helps align the mini-batch gradients with the entire dataset's gradient. This helps counteract the stochasticity of SGD and limit the influence of individual batches on model training. Our findings lead us to a solution for improving performance in underfitting models - early dropout: dropout is applied only during the initial phases of training, and turned off afterwards. Models equipped with early dropout achieve lower final training loss compared to their counterparts without dropout. Additionally, we explore a symmetric technique for regularizing overfitting models - late dropout, where dropout is not used in the early iterations and is only activated later in training. Experiments on ImageNet and various vision tasks demonstrate that our methods consistently improve generalization accuracy. Our results encourage more research on understanding regularization in deep learning and our methods can be useful tools for future neural network training, especially in the era of large data. Code is available at https://github.com/facebookresearch/dropout .

updated: Thu Mar 02 2023 18:59:15 GMT+0000 (UTC)

published: Thu Mar 02 2023 18:59:15 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト