Guidelines for the Regularization of Gammas in Batch Normalization for Deep Residual Networks

Bum Jun Kim; Hyeyeon Choi; Hyeonah Jang; Dong Gu Lee; Wonseok Jeong; Sang Woo Kim

深い残余ネットワークのバッチ正規化におけるガンマの正則化のガイドライン

ニューラルネットワークの重みのL2正則化は、標準的なトレーニングトリックとして広く使用されています。ただし、バッチ正規化のトレーニング可能なパラメーターであるガンマのL2正則化は、議論の余地のない謎のままであり、ライブラリと開業医に応じてさまざまな方法で適用されます。この論文では、ガンマのL2正則化が有効かどうかを調べます。この問題を調査するために、2つのアプローチを検討します。1）残差ネットワークを恒等写像のように動作させる分散制御と2）有効学習率の改善による安定した最適化。 2つの分析を通じて、L2正則化を適用するための望ましいガンマと望ましくないガンマを指定し、それらを管理するための4つのガイドラインを提案します。いくつかの実験で、4つのカテゴリのガンマにL2正則化を適用することによって引き起こされるパフォーマンスの増減を観察しました。これは、4つのガイドラインと一致しています。提案されたガイドラインは、残りのネットワークやトランスのバリアントを含む、さまざまなタスクとアーキテクチャを通じて検証されました。

L2 regularization for weights in neural networks is widely used as a standard training trick. However, L2 regularization for gamma, a trainable parameter of batch normalization, remains an undiscussed mystery and is applied in different ways depending on the library and practitioner. In this paper, we study whether L2 regularization for gamma is valid. To explore this issue, we consider two approaches: 1) variance control to make the residual network behave like identity mapping and 2) stable optimization through the improvement of effective learning rate. Through two analyses, we specify the desirable and undesirable gamma to apply L2 regularization and propose four guidelines for managing them. In several experiments, we observed the increase and decrease in performance caused by applying L2 regularization to gamma of four categories, which is consistent with our four guidelines. Our proposed guidelines were validated through various tasks and architectures, including variants of residual networks and transformers.

updated: Sun May 15 2022 11:22:25 GMT+0000 (UTC)

published: Sun May 15 2022 11:22:25 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト