ImpressLearn: Continual Learning via Combined Task Impressions

Dhrupad Bhardwaj; Julia Kempe; Artem Vysogorets; Angela M. Teng; Evaristus C. Ezekwem

ImpressLearn: 複合タスクインプレッションによる継続的学習

この研究では、壊滅的な忘却を被ることなく複数のタスクでディープニューラルネットワークを順次トレーニングする新しい方法を提案し、目に見えないタスクに迅速に適応する機能を与えます。ネットワークマスキングに関する既存の研究 (Wortsman et al., 2020) から始めて、ランダムに初期化されたバックボーンネットワークで少数のタスク固有のスーパーマスク (インプレッション) の線形結合を学習するだけで、以前の精度を維持するのに十分であることを示しています。学習したタスクだけでなく、目に見えないタスクで高い精度を達成します。以前の方法とは対照的に、新しいタスクごとに専用のマスクまたはコンテキストを生成する必要はなく、代わりに転移学習を利用してタスクごとのパラメーターのオーバーヘッドを小さく保ちます。私たちの研究は、個々の印象を線形的に組み合わせることの威力を示しており、それぞれが単独では不十分であり、専用マスクに匹敵するパフォーマンスを達成しています.さらに、同じタスク (同種のマスク) からの繰り返しの印象でさえ、十分に多くの印象が使用されている場合、組み合わせると異種の組み合わせのパフォーマンスに近づくことができます。私たちのアプローチは、既存の方法よりも効率的にスケーリングし、多くの場合、桁違いに少ないパラメーターを必要とし、タスク ID が欠落している場合でも変更なしで機能できます。さらに、タスクラベルが推論で与えられない設定では、私たちのアルゴリズムは、Wortsman et al., 2020 によって使用されたワンショット手順に代わる多くの場合、有利な代替手段を提供します。データセットとネットワークアーキテクチャ。

This work proposes a new method to sequentially train deep neural networks on multiple tasks without suffering catastrophic forgetting, while endowing it with the capability to quickly adapt to unseen tasks. Starting from existing work on network masking (Wortsman et al., 2020), we show that simply learning a linear combination of a small number of task-specific supermasks (impressions) on a randomly initialized backbone network is sufficient to both retain accuracy on previously learned tasks, as well as achieve high accuracy on unseen tasks. In contrast to previous methods, we do not require to generate dedicated masks or contexts for each new task, instead leveraging transfer learning to keep per-task parameter overhead small. Our work illustrates the power of linearly combining individual impressions, each of which fares poorly in isolation, to achieve performance comparable to a dedicated mask. Moreover, even repeated impressions from the same task (homogeneous masks), when combined, can approach the performance of heterogeneous combinations if sufficiently many impressions are used. Our approach scales more efficiently than existing methods, often requiring orders of magnitude fewer parameters and can function without modification even when task identity is missing. In addition, in the setting where task labels are not given at inference, our algorithm gives an often favorable alternative to the one-shot procedure used by Wortsman et al., 2020. We evaluate our method on a number of well-known image classification datasets and network architectures.

updated: Tue Jan 31 2023 19:52:37 GMT+0000 (UTC)

published: Wed Oct 05 2022 02:28:25 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト