A Multi-Head Model for Continual Learning via Out-of-Distribution Replay

Gyuhak Kim; Zixuan Ke; Bing Liu

配信外再生による継続学習のためのマルチヘッドモデル

本稿では、継続学習 (CL) のクラス増分学習 (CIL) について検討します。 CIL における壊滅的忘却 (CF) に対処するために、多くのアプローチが提案されています。ほとんどのメソッドは、単一のヘッドネットワーク内のすべてのタスクのすべてのクラスに対して単一の分類器を段階的に構築します。 CF を防ぐための一般的なアプローチは、以前のタスクからの少数のサンプルを記憶し、新しいタスクのトレーニング中にそれらを再生することです。ただし、このアプローチは、メモリに保存された限られた数のサンプルのみを使用して、以前のタスクで学習したパラメーターが更新または調整されるため、依然として深刻な CF に悩まされています。この論文では、MORE と呼ばれる変換ネットワークを使用して、タスクごとに個別の分類子 (ヘッド) (マルチヘッドモデルと呼ばれる) を構築する、まったく異なるアプローチを提案しています。既存のアプローチでメモリに保存されたサンプルを使用して以前のタスク/クラスのネットワークを更新する代わりに、MORE は保存されたサンプルを活用して、以前のタスク/クラスで学習したネットワークを更新することなく、タスク固有の分類器を構築します (新しい分類ヘッドを追加します)。 . MORE の新しいタスクのモデルは、タスクのクラスを学習し、タスクの同じデータ分布 (つまり、分布外 (OOD)) からではないサンプルを検出するようにトレーニングされます。これにより、テストインスタンスが属するタスクの分類子は正しいクラスに対して高いスコアを生成し、他のタスクの分類子は低いスコアを生成できます。これは、テストインスタンスがこれらの分類子のデータ分布からのものではないためです。実験結果は、MORE が最先端のベースラインよりも優れており、継続的な学習設定で OOD 検出を自然に実行できることを示しています。

This paper studies class incremental learning (CIL) of continual learning (CL). Many approaches have been proposed to deal with catastrophic forgetting (CF) in CIL. Most methods incrementally construct a single classifier for all classes of all tasks in a single head network. To prevent CF, a popular approach is to memorize a small number of samples from previous tasks and replay them during training of the new task. However, this approach still suffers from serious CF as the parameters learned for previous tasks are updated or adjusted with only the limited number of saved samples in the memory. This paper proposes an entirely different approach that builds a separate classifier (head) for each task (called a multi-head model) using a transformer network, called MORE. Instead of using the saved samples in memory to update the network for previous tasks/classes in the existing approach, MORE leverages the saved samples to build a task specific classifier (adding a new classification head) without updating the network learned for previous tasks/classes. The model for the new task in MORE is trained to learn the classes of the task and also to detect samples that are not from the same data distribution (i.e., out-of-distribution (OOD)) of the task. This enables the classifier for the task to which the test instance belongs to produce a high score for the correct class and the classifiers of other tasks to produce low scores because the test instance is not from the data distributions of these classifiers. Experimental results show that MORE outperforms state-of-the-art baselines and is also naturally capable of performing OOD detection in the continual learning setting.

updated: Sat Aug 20 2022 19:17:12 GMT+0000 (UTC)

published: Sat Aug 20 2022 19:17:12 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト