Efficient Generalization Improvement Guided by Random Weight Perturbation

Tao Li; Weihao Yan; Zehao Lei; Yingwen Wu; Kun Fang; Ming Yang; Xiaolin Huang

ランダムな重みの摂動によって導かれる効率的な一般化の改善

ディープニューラルネットワーク (DNN) の大きな可能性を完全に明らかにするために、さまざまな学習アルゴリズムが開発され、モデルの汎化能力が向上しました。最近、シャープネス認識最小化 (SAM) は、小さな近傍内でシャープネスメジャーを最小化することによって一般化を改善するための一般的なスキームを確立し、最先端のパフォーマンスを達成します。ただし、SAM では、最小最大問題を解くために 2 つの連続した勾配評価が必要であり、必然的にトレーニング時間が 2 倍になります。この論文では、フィルターごとのランダム重み摂動 (RWP) に頼って、SAM の入れ子になった勾配を分離します。 SAM の小さな敵対的摂動とは異なり、RWP はよりソフトで、はるかに大きな摂動を許容します。具体的には、ランダム摂動と元の損失関数を使用して損失関数を共同で最適化します。前者はネットワークをより広い平坦な領域に導き、後者は必要なローカル情報の回復に役立ちます。これら 2 つの損失項は互いに補完的であり、相互に独立しています。したがって、対応する勾配を効率的に並行して計算することができ、通常のトレーニングとほぼ同じトレーニング速度を実現できます。その結果、CIFAR では非常に競争力のあるパフォーマンスを達成し、ImageNet では SAM と比較して著しく優れたパフォーマンス (例: +1.1%) を達成していますが、トレーニング時間は常に半分で済みます。コードは https://github.com/nblt/RWP で公開されています。

To fully uncover the great potential of deep neural networks (DNNs), various learning algorithms have been developed to improve the model's generalization ability. Recently, sharpness-aware minimization (SAM) establishes a generic scheme for generalization improvements by minimizing the sharpness measure within a small neighborhood and achieves state-of-the-art performance. However, SAM requires two consecutive gradient evaluations for solving the min-max problem and inevitably doubles the training time. In this paper, we resort to filter-wise random weight perturbations (RWP) to decouple the nested gradients in SAM. Different from the small adversarial perturbations in SAM, RWP is softer and allows a much larger magnitude of perturbations. Specifically, we jointly optimize the loss function with random perturbations and the original loss function: the former guides the network towards a wider flat region while the latter helps recover the necessary local information. These two loss terms are complementary to each other and mutually independent. Hence, the corresponding gradients can be efficiently computed in parallel, enabling nearly the same training speed as regular training. As a result, we achieve very competitive performance on CIFAR and remarkably better performance on ImageNet (e.g. +1.1%) compared with SAM, but always require half of the training time. The code is released at https://github.com/nblt/RWP.

updated: Mon Nov 21 2022 14:24:34 GMT+0000 (UTC)

published: Mon Nov 21 2022 14:24:34 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト