Exclusive Supermask Subnetwork Training for Continual Learning

Prateek Yadav; Mohit Bansal

継続的な学習のための独自のスーパーマスクサブネットワークトレーニング

継続的学習 (CL) メソッドは、壊滅的な忘却を回避し、新しいタスクに転送可能な表現を学習することに主に焦点を当てています。最近、ワーツマン等。 (2020) CL メソッド SupSup を提案しました。これは、ランダムに初期化された固定ベースネットワーク (モデル) を使用し、各重みを選択的に保持または削除してサブネットワークを生成する新しいタスクごとにスーパーマスクを見つけます。ネットワークの重みが更新されていないため、忘れるのを防ぎます。忘れることはありませんが、固定重みがその表現力を制限するため、スーパーマスクのパフォーマンスは最適ではありません。さらに、新しいタスクが学習されたときに、モデル内に知識が蓄積または転送されることはありません。したがって、排他的で重複しないサブネットワークの重み付けトレーニングを実行する ExSSNeT (排他的スーパーマスクサブネットワークトレーニング) を提案します。これにより、後続のタスクによる共有重みの更新の競合が回避され、パフォーマンスが向上すると同時に、忘却が防止されます。さらに、知識伝達を改善するために、以前のタスクに基づいて新しいタスクのマスクを動的に初期化する、新しい KNN ベースの知識伝達 (KKT) モジュールを提案します。 ExSSNeT は、忘却を防ぎながら、テキスト分類と視覚タスクの両方で SupSup や他の強力な以前の方法よりも優れていることを示しています。さらに、ExSSNeT は、モデルパラメーターの 2 ～ 10% をアクティブにするスパースマスクに特に有利であり、SupSup より平均 8.3% 向上します。さらに、ExSSNeT は多数のタスク (100) にスケーリングし、KKT モジュールは全体的なパフォーマンスを向上させながら新しいタスクをより速く学習するのに役立ちます。私たちのコードはhttps://github.com/prateeky2806/exessnetで入手できます

Continual Learning (CL) methods mainly focus on avoiding catastrophic forgetting and learning representations that are transferable to new tasks. Recently, Wortsman et al. (2020) proposed a CL method, SupSup, which uses a randomly initialized, fixed base network (model) and finds a supermask for each new task that selectively keeps or removes each weight to produce a subnetwork. They prevent forgetting as the network weights are not being updated. Although there is no forgetting, the performance of the supermask is sub-optimal because fixed weights restrict its representational power. Furthermore, there is no accumulation or transfer of knowledge inside the model when new tasks are learned. Hence, we propose ExSSNeT (Exclusive Supermask SubNEtwork Training), which performs exclusive and non-overlapping subnetwork weight training. This avoids conflicting updates to the shared weights by subsequent tasks to improve performance while still preventing forgetting. Furthermore, we propose a novel KNN-based Knowledge Transfer (KKT) module that dynamically initializes a new task's mask based on previous tasks for improving knowledge transfer. We demonstrate that ExSSNeT outperforms SupSup and other strong previous methods on both text classification and vision tasks while preventing forgetting. Moreover, ExSSNeT is particularly advantageous for sparse masks that activate 2-10% of the model parameters, resulting in an average improvement of 8.3% over SupSup. Additionally, ExSSNeT scales to a large number of tasks (100), and our KKT module helps to learn new tasks faster while improving overall performance. Our code is available at https://github.com/prateeky2806/exessnet

updated: Tue Oct 18 2022 23:27:07 GMT+0000 (UTC)

published: Tue Oct 18 2022 23:27:07 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト