Normalization Layers Are All That Sharpness-Aware Minimization Needs

Maximilian Mueller; Tiffany Vlaar; David Rolnick; Matthias Hein

シャープネスを意識した最小化に必要なのは正規化レイヤーだけです

シャープネスを意識した最小化 (SAM) は、最小値のシャープネスを低減するために提案され、さまざまな設定で汎化パフォーマンスを向上させることが示されています。この研究では、SAM の敵対的ステップでアフィン正規化パラメータ (全パラメータの 0.1% 未満を構成) のみを摂動させる方が、すべてのパラメータを摂動させるよりも優れたパフォーマンスを発揮することを示します。この発見は、さまざまな SAM バリアント、および ResNet (バッチ正規化) アーキテクチャと Vision Transformer (レイヤー正規化) アーキテクチャの両方に一般化されます。代替のスパース摂動アプローチを検討したところ、そのような極端なスパースレベルでは同様のパフォーマンス向上が達成されないことがわかり、この動作が正規化層に固有であることがわかります。私たちの調査結果は、汎化パフォーマンスの向上における SAM の有効性を再確認していますが、これが単にシャープネスの低下だけによって引き起こされているのかどうかについては疑問を投げかけています。実験のコードは https://github.com/mueller-mp/SAM-ON で公開されています。

Sharpness-aware minimization (SAM) was proposed to reduce sharpness of minima and has been shown to enhance generalization performance in various settings. In this work we show that perturbing only the affine normalization parameters (comprising less than 0.1% of the total parameters) in the adversarial step of SAM outperforms perturbing all of the parameters. This finding generalizes to different SAM variants and both ResNet (Batch Normalization) and Vision Transformer (Layer Normalization) architectures. We consider alternative sparse perturbation approaches and find that these do not achieve similar performance enhancement at such extreme sparsity levels, showing that this behaviour is unique to the normalization layers. Although our findings reaffirm the effectiveness of SAM in improving generalization performance, they cast doubt on whether this is solely caused by reduced sharpness. The code for our experiments is publicly available at https://github.com/mueller-mp/SAM-ON.

updated: Wed Jun 07 2023 08:05:46 GMT+0000 (UTC)

published: Wed Jun 07 2023 08:05:46 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト