Gradient Projection Memory for Continual Learning

Gobinda Saha; Isha Garg; Kaushik Roy

継続学習のための勾配射影記憶

過去の課題を忘れずに継続的に学習できることは、人工学習システムにとって望ましい属性です。人工ニューラルネットワークでこのような学習を可能にする既存のアプローチは、通常、ネットワークの成長、重要度に基づく重みの更新、またはメモリからの古いデータの再生に依存しています。対照的に、ニューラルネットワークが過去のタスクにとって重要であると考えられる勾配部分空間に直交する方向に勾配ステップをとることによって新しいタスクを学習する新しいアプローチを提案します。特異値分解（SVD）を使用して各タスクをシングルショットで学習した後、ネットワーク表現（アクティブ化）を分析してこれらの部分空間のベースを見つけ、勾配投影メモリ（GPM）としてメモリに保存します。定性的および定量的分析により、このような直交勾配降下法は、過去のタスクへの干渉を最小限に抑えるか、まったく引き起こさないことを示し、それによって忘却を軽減します。タスクの短いシーケンスと長いシーケンスを持つ多様な画像分類データセットでアルゴリズムを評価し、最先端のアプローチと比較して、より優れた、または同等のパフォーマンスを報告します。

The ability to learn continually without forgetting the past tasks is a desired attribute for artificial learning systems. Existing approaches to enable such learning in artificial neural networks usually rely on network growth, importance based weight update or replay of old data from the memory. In contrast, we propose a novel approach where a neural network learns new tasks by taking gradient steps in the orthogonal direction to the gradient subspaces deemed important for the past tasks. We find the bases of these subspaces by analyzing network representations (activations) after learning each task with Singular Value Decomposition (SVD) in a single shot manner and store them in the memory as Gradient Projection Memory (GPM). With qualitative and quantitative analyses, we show that such orthogonal gradient descent induces minimum to no interference with the past tasks, thereby mitigates forgetting. We evaluate our algorithm on diverse image classification datasets with short and long sequences of tasks and report better or on-par performance compared to the state-of-the-art approaches.

updated: Wed Mar 17 2021 16:31:29 GMT+0000 (UTC)

published: Wed Mar 17 2021 16:31:29 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト