Scaling Forward Gradient With Local Losses

Mengye Ren; Simon Kornblith; Renjie Liao; Geoffrey Hinton

局所損失を伴う前方勾配のスケーリング

前方勾配学習は、ノイズの多い方向勾配を計算し、深層ニューラルネットワークを学習するためのバックプロップに代わる生物学的にもっともらしい代替手段です。ただし、標準の順方向勾配アルゴリズムを単純に適用すると、学習するパラメーターの数が多い場合に分散が大きくなります。このホワイトペーパーでは、標準的なディープラーニングベンチマークタスクで順方向勾配学習を実用的にする一連のアーキテクチャおよびアルゴリズムの修正を提案します。重みではなく活性化に摂動を適用することにより、順方向勾配推定量の分散を大幅に削減できることを示します。それぞれが少数の学習可能なパラメーターのみを含む多数のローカル貪欲な損失関数と、ローカル学習により適した新しい MLPMixer にインスパイアされたアーキテクチャ LocalMixer を導入することにより、順方向勾配のスケーラビリティをさらに改善します。私たちのアプローチは、MNIST と CIFAR-10 のバックプロップと一致し、ImageNet で以前に提案されたバックプロップのないアルゴリズムよりも大幅に優れています。

Forward gradient learning computes a noisy directional gradient and is a biologically plausible alternative to backprop for learning deep neural networks. However, the standard forward gradient algorithm, when applied naively, suffers from high variance when the number of parameters to be learned is large. In this paper, we propose a series of architectural and algorithmic modifications that together make forward gradient learning practical for standard deep learning benchmark tasks. We show that it is possible to substantially reduce the variance of the forward gradient estimator by applying perturbations to activations rather than weights. We further improve the scalability of forward gradient by introducing a large number of local greedy loss functions, each of which involves only a small number of learnable parameters, and a new MLPMixer-inspired architecture, LocalMixer, that is more suitable for local learning. Our approach matches backprop on MNIST and CIFAR-10 and significantly outperforms previously proposed backprop-free algorithms on ImageNet.

updated: Thu Mar 02 2023 03:08:10 GMT+0000 (UTC)

published: Fri Oct 07 2022 03:52:27 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト