Sparsity and Heterogeneous Dropout for Continual Learning in the Null Space of Neural Activations

Ali Abbasi; Parsa Nooralinejad; Vladimir Braverman; Hamed Pirsiavash; Soheil Kolouri

神経活性化の零空間における継続学習のためのスパース性と不均一なドロップアウト

非定常入力データストリームからの継続的/生涯学習は、インテリジェンスの基礎です。さまざまなアプリケーションでの驚異的なパフォーマンスにもかかわらず、ディープニューラルネットワークは、新しい情報を学習するときに、以前に学習した情報を忘れがちです。この現象は「壊滅的な忘却」と呼ばれ、安定性と可塑性のジレンマに深く根ざしています。深いニューラルネットワークでの壊滅的な忘却を克服することは、近年活発な研究分野になっています。特に、勾配投影ベースの方法は、壊滅的な忘却を克服する上で並外れた性能を最近示しました。この論文は、スパース性と不均一なドロップアウトに基づく2つの生物学的に着想を得たメカニズムを提案します。これらのメカニズムは、長い一連のタスクにわたって継続的な学習者のパフォーマンスを大幅に向上させます。私たちが提案するアプローチは、Gradient Projection Memory（GPM）フレームワークに基づいています。ニューラルネットワークの各レイヤーでk-winnerアクティベーションを活用して、各タスクにレイヤーごとのスパースアクティベーションを適用し、ネットワークが異なるタスク間で重複しないアクティベーションパターンを使用するように促すタスク間異種ドロップアウトを使用します。さらに、分布シフト下での継続学習のための2つの新しいベンチマーク、つまりContinualSwissRollとImageNetSuperDog-40を紹介します。最後に、提案された方法の詳細な分析を提供し、さまざまなベンチマークの継続的な学習問題で大幅なパフォーマンスの向上を示します。

Continual/lifelong learning from a non-stationary input data stream is a cornerstone of intelligence. Despite their phenomenal performance in a wide variety of applications, deep neural networks are prone to forgetting their previously learned information upon learning new ones. This phenomenon is called "catastrophic forgetting" and is deeply rooted in the stability-plasticity dilemma. Overcoming catastrophic forgetting in deep neural networks has become an active field of research in recent years. In particular, gradient projection-based methods have recently shown exceptional performance at overcoming catastrophic forgetting. This paper proposes two biologically-inspired mechanisms based on sparsity and heterogeneous dropout that significantly increase a continual learner's performance over a long sequence of tasks. Our proposed approach builds on the Gradient Projection Memory (GPM) framework. We leverage k-winner activations in each layer of a neural network to enforce layer-wise sparse activations for each task, together with a between-task heterogeneous dropout that encourages the network to use non-overlapping activation patterns between different tasks. In addition, we introduce two new benchmarks for continual learning under distributional shift, namely Continual Swiss Roll and ImageNet SuperDog-40. Lastly, we provide an in-depth analysis of our proposed method and demonstrate a significant performance boost on various benchmark continual learning problems.

updated: Fri Jul 08 2022 04:23:39 GMT+0000 (UTC)

published: Sat Mar 12 2022 21:12:41 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト