Neural networks with late-phase weights

Johannes von Oswald; Seijin Kobayashi; Alexander Meulemans; Christian Henning; Benjamin F. Grewe; João Sacramento

後期重みを持つニューラルネットワーク

ニューラルネットワークをトレーニングする主に成功した方法は、確率的勾配降下法（SGD）のいくつかの変形を使用してそれらの重みを学習することです。ここでは、学習の後期段階で重みのサブセットをアンサンブルすることにより、SGDによって検出されたソリューションをさらに改善できることを示します。学習の最後に、重み空間の空間平均をとることにより、単一のモデルを取得します。計算コストの増加を回避するために、残りのパラメーターと乗法的に相互作用する低次元の後期重みモデルのファミリーを調査します。私たちの結果は、後期の重みで標準モデルを拡張すると、CIFAR-10 / 100、ImageNet、enwik8などの確立されたベンチマークの一般化が改善されることを示しています。これらの調査結果は、ニューラルネットワーク学習の後期の簡略化された図を提供するノイズの多い二次問題の理論的分析によって補完されます。

The largely successful method of training neural networks is to learn their weights using some variant of stochastic gradient descent (SGD). Here, we show that the solutions found by SGD can be further improved by ensembling a subset of the weights in late stages of learning. At the end of learning, we obtain back a single model by taking a spatial average in weight space. To avoid incurring increased computational costs, we investigate a family of low-dimensional late-phase weight models which interact multiplicatively with the remaining parameters. Our results show that augmenting standard models with late-phase weights improves generalization in established benchmarks such as CIFAR-10/100, ImageNet and enwik8. These findings are complemented with a theoretical analysis of a noisy quadratic problem which provides a simplified picture of the late phases of neural network learning.

updated: Mon Apr 11 2022 13:55:43 GMT+0000 (UTC)

published: Sat Jul 25 2020 13:23:37 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト