Beyond BatchNorm: Towards a General Understanding of Normalization in Deep Learning

Ekdeep Singh Lubana; Robert P. Dick; Hidenori Tanaka

BatchNormを超えて：ディープラーニングにおける正規化の一般的な理解に向けて

BatchNormに触発されて、ディープラーニングで正規化レイヤーが爆発的に増加しました。最近の研究では、BatchNormの成功を説明するために、BatchNormに多数の有益なプロパティが特定されています。ただし、代替の正規化手法の追求を考えると、これらのプロパティを一般化して、特定のレイヤーの成功/失敗を正確に予測できるようにする必要があります。この作業では、ランダムに初期化されたディープニューラルネットワーク（DNN）のBatchNormの既知のプロパティを、最近提案された9つの正規化レイヤーに拡張することにより、この目標に向けた第一歩を踏み出します。主な調査結果は次のとおりです。（i）BatchNormと同様に、アクティベーションベースの正規化レイヤーは、ResNetでのアクティベーションの爆発を回避できます。（ii）GroupNormを使用すると、アクティベーションのランクが少なくともΩ（\ text {widthGroup Size}）になることが保証されます。これにより、LayerNormの最適化速度が遅くなる理由が説明されます。（iii）グループサイズが小さいと、初期のレイヤーで勾配ノルムが大きくなるため、インスタンスの正規化でのトレーニングの不安定性の問題が正当化され、GroupNormでの速度と安定性のトレードオフが示されます。全体として、私たちの分析は、深層学習における正規化手法の成功を説明するいくつかの一般的なメカニズムを明らかにし、DNN正規化レイヤーの広大な設計空間を体系的に探索するためのコンパスを提供します。

Inspired by BatchNorm, there has been an explosion of normalization layers in deep learning. Recent works have identified a multitude of beneficial properties in BatchNorm to explain its success. However, given the pursuit of alternative normalization techniques, these properties need to be generalized so that any given layer's success/failure can be accurately predicted. In this work, we take a first step towards this goal by extending known properties of BatchNorm in randomly initialized deep neural networks (DNNs) to nine recently proposed normalization layers. Our primary findings follow: (i) Similar to BatchNorm, activations-based normalization layers can avoid exploding activations in ResNets; (ii) Use of GroupNorm ensures rank of activations is at least Ω(\text{widthGroup Size}), thus explaining why LayerNorm witnesses slow optimization speed; (iii) Small group sizes result in large gradient norm in earlier layers, hence justifying training instability issues in Instance Normalization and illustrating a speed-stability tradeoff in GroupNorm. Overall, our analysis reveals several general mechanisms that explain the success of normalization techniques in deep learning, providing us with a compass to systematically explore the vast design space of DNN normalization layers.

updated: Thu Jun 10 2021 17:51:30 GMT+0000 (UTC)

published: Thu Jun 10 2021 17:51:30 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト