Sharpness-Aware Training for Free

Jiawei Du; Daquan Zhou; Jiashi Feng; Vincent Y. F. Tan; Joey Tianyi Zhou

シャープネスを意識したトレーニングを無料で

最新のディープニューラルネットワーク（DNN）は、最先端のパフォーマンスを実現していますが、通常はパラメーターが多すぎます。過剰なパラメーター化は、他のカスタマイズされたトレーニング戦略がない場合、望ましくないほど大きな汎化誤差をもたらす可能性があります。最近、Sharpness-Aware Minimization（SAM）という名前の一連の調査により、損失ランドスケープのジオメトリを反映するシャープネス測定値を最小化することで、汎化誤差を大幅に減らすことができることが示されました。ただし、SAMのような方法では、シャープネスの測定値を概算するために、特定のベースオプティマイザー（SGDなど）の2倍の計算オーバーヘッドが発生します。このホワイトペーパーでは、Sharpness-Aware Training for Free（SAF）を提案します。これは、ベースオプティマイザよりも計算コストをほぼゼロにしてシャープなランドスケープを軽減します。直感的には、SAFは、重みの更新の軌跡全体で、鋭い極小値の損失の突然の低下を回避することによってこれを実現します。具体的には、SAMのシャープネス測定の代わりに、現在の重みと過去の重みを使用したDNNの出力間のKL発散に基づく新しい軌道損失を提案します。この損失は、モデルの更新軌道に沿ったトレーニング損失の変化率をキャプチャします。それを最小化することにより、SAFは、改善された一般化機能を備えたフラットな最小値への収束を保証します。広範な経験的結果は、SAFがSAMと同じ方法でシャープネスを最小化し、基本オプティマイザーと本質的に同じ計算コストでImageNetデータセットでより良い結果をもたらすことを示しています。

Modern deep neural networks (DNNs) have achieved state-of-the-art performances but are typically over-parameterized. The over-parameterization may result in undesirably large generalization error in the absence of other customized training strategies. Recently, a line of research under the name of Sharpness-Aware Minimization (SAM) has shown that minimizing a sharpness measure, which reflects the geometry of the loss landscape, can significantly reduce the generalization error. However, SAM-like methods incur a two-fold computational overhead of the given base optimizer (e.g. SGD) for approximating the sharpness measure. In this paper, we propose Sharpness-Aware Training for Free, or SAF, which mitigates the sharp landscape at almost zero additional computational cost over the base optimizer. Intuitively, SAF achieves this by avoiding sudden drops in the loss in the sharp local minima throughout the trajectory of the updates of the weights. Specifically, we suggest a novel trajectory loss, based on the KL-divergence between the outputs of DNNs with the current weights and past weights, as a replacement of the SAM's sharpness measure. This loss captures the rate of change of the training loss along the model's update trajectory. By minimizing it, SAF ensures the convergence to a flat minimum with improved generalization capabilities. Extensive empirical results show that SAF minimizes the sharpness in the same way that SAM does, yielding better results on the ImageNet dataset with essentially the same computational cost as the base optimizer.

updated: Thu Mar 02 2023 06:39:34 GMT+0000 (UTC)

published: Fri May 27 2022 16:32:43 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト