Exclusive Supermask Subnetwork Training for Continual Learning

Prateek Yadav; Mohit Bansal

継続学習のための独自のスーパーマスクサブネットワークトレーニング

継続学習 (CL) 手法は、壊滅的な忘れを避けながら、時間をかけて知識を蓄積することに重点を置いています。最近、ワーツマンら。 (2020) は、ランダムに初期化された固定ベースネットワーク (モデル) を使用し、サブネットワークを生成するために各重みを選択的に保持または削除する新しいタスクごとにスーパーマスクを見つける CL メソッド SupSup を提案しました。ネットワークの重みが更新されないため、忘れるのを防ぎます。忘れることはありませんが、固定重みによって表現力が制限されるため、SupSup のパフォーマンスは最適とは言えません。さらに、新しいタスクを学習するときに、モデル内での知識の蓄積や伝達は行われません。そこで、排他的かつ重複しないサブネットワーク重みトレーニングを実行する ExSSNeT (Exclusive Supermask SubNETwork Training) を提案します。これにより、後続のタスクによる共有重みの更新の競合が回避され、忘れを防止しながらパフォーマンスが向上します。さらに、以前に取得した知識を利用して新しいタスクをより適切かつ迅速に学習する、新しい KNN ベースの知識伝達 (KKT) モジュールを提案します。 ExSSNeT が、NLP ドメインとビジョンドメインの両方で、忘れを防止しながら、強力な従来の方法よりも優れたパフォーマンスを発揮することを実証します。さらに、ExSSNeT は、モデルパラメータの 2 ～ 10% をアクティブにするスパースマスクに特に有利であり、SupSup と比較して平均 8.3% の改善が得られます。さらに、ExSSNeT は多数のタスク (100) に対応します。私たちのコードは https://github.com/prateeky2806/exessnet で入手できます。

Continual Learning (CL) methods focus on accumulating knowledge over time while avoiding catastrophic forgetting. Recently, Wortsman et al. (2020) proposed a CL method, SupSup, which uses a randomly initialized, fixed base network (model) and finds a supermask for each new task that selectively keeps or removes each weight to produce a subnetwork. They prevent forgetting as the network weights are not being updated. Although there is no forgetting, the performance of SupSup is sub-optimal because fixed weights restrict its representational power. Furthermore, there is no accumulation or transfer of knowledge inside the model when new tasks are learned. Hence, we propose ExSSNeT (Exclusive Supermask SubNEtwork Training), that performs exclusive and non-overlapping subnetwork weight training. This avoids conflicting updates to the shared weights by subsequent tasks to improve performance while still preventing forgetting. Furthermore, we propose a novel KNN-based Knowledge Transfer (KKT) module that utilizes previously acquired knowledge to learn new tasks better and faster. We demonstrate that ExSSNeT outperforms strong previous methods on both NLP and Vision domains while preventing forgetting. Moreover, ExSSNeT is particularly advantageous for sparse masks that activate 2-10% of the model parameters, resulting in an average improvement of 8.3% over SupSup. Furthermore, ExSSNeT scales to a large number of tasks (100). Our code is available at https://github.com/prateeky2806/exessnet.

updated: Wed Jul 05 2023 16:57:43 GMT+0000 (UTC)

published: Tue Oct 18 2022 23:27:07 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト