Wide Neural Networks Forget Less Catastrophically

Seyed Iman Mirzadeh; Arslan Chaudhry; Huiyi Hu; Razvan Pascanu; Dilan Gorur; Mehrdad Farajtabar

ワイドニューラルネットワークは壊滅的な忘却を少なくします

継続的な学習における研究の増加は、分布シフトに対してより堅牢な新しいアルゴリズムを設計することにより、ニューラルネットワークの「壊滅的な忘却」を克服することに専念しています。継続的な学習文献の最近の進歩は励みになりますが、ニューラルネットワークのどの特性が壊滅的な忘却に寄与するかについての私たちの理解はまだ限られています。これに対処するために、この作業では、継続的な学習アルゴリズムに焦点を当てるのではなく、モデル自体に焦点を当て、ニューラルネットワークアーキテクチャの「幅」が壊滅的な忘却に与える影響を調査し、幅が忘却に驚くほど重要な影響を与えることを示します。この効果を説明するために、勾配ノルムとスパース性、直交化、怠惰なトレーニング体制などのさまざまな観点からネットワークの学習ダイナミクスを研究します。さまざまなアーキテクチャおよび継続的な学習ベンチマークにわたる経験的結果と一致する潜在的な説明を提供します。

A growing body of research in continual learning is devoted to overcoming the "Catastrophic Forgetting" of neural networks by designing new algorithms that are more robust to the distribution shifts. While the recent progress in continual learning literature is encouraging, our understanding of what properties of neural networks contribute to catastrophic forgetting is still limited. To address this, instead of focusing on continual learning algorithms, in this work, we focus on the model itself and study the impact of "width" of the neural network architecture on catastrophic forgetting, and show that width has a surprisingly significant effect on forgetting. To explain this effect, we study the learning dynamics of the network from various perspectives such as gradient norm and sparsity, orthogonalization, and lazy training regime. We provide potential explanations that are consistent with the empirical results across different architectures and continual learning benchmarks.

updated: Thu Oct 21 2021 23:49:23 GMT+0000 (UTC)

published: Thu Oct 21 2021 23:49:23 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト