Mimicking the Oracle: An Initial Phase Decorrelation Approach for Class Incremental Learning

Yujun Shi; Kuangqi Zhou; Jian Liang; Zihang Jiang; Jiashi Feng; Philip Torr; Song Bai; Vincent Y. F. Tan

オラクルの模倣：クラス増分学習のための初期段階の無相関化アプローチ

クラスインクリメンタル学習（CIL）は、フェーズごとにマルチクラス分類器を学習することを目的としています。各フェーズでは、クラスのサブセットのデータのみが提供されます。以前の作品は、主に最初のものの後の段階で忘却を軽減することに焦点を当てています。ただし、初期段階でCILを改善することも有望な方向であることがわかります。具体的には、初期段階でCIL学習者に、すべてのクラスで共同でトレーニングされたモデルと同様の表現を出力するように直接促すことで、CILのパフォーマンスを大幅に向上できることを実験的に示します。これに動機付けられて、私たちは素朴に訓練された初期段階モデルとオラクルモデルの違いを研究します。具体的には、これら2つのモデルの大きな違いの1つはトレーニングクラスの数であるため、このような違いがモデルの表現にどのように影響するかを調査します。トレーニングクラスが少ないほど、各クラスのデータ表現は長くて狭い領域にあることがわかります。トレーニングクラスが増えると、各クラスの表現はより均一に分散します。この観察に触発されて、各クラスの表現を効果的に正規化してより均一に分散し、すべてのクラスで共同でトレーニングされたモデル（つまり、オラクルモデル）を模倣するクラスワイズ非相関（CwD）を提案します。私たちのCwDは実装が簡単で、既存のメソッドに簡単にプラグインできます。さまざまなベンチマークデータセットでの広範な実験により、CwDは、既存の最先端の方法のパフォーマンスを一貫して大幅に1％から3％向上させることが示されています。コードがリリースされます。

Class Incremental Learning (CIL) aims at learning a multi-class classifier in a phase-by-phase manner, in which only data of a subset of the classes are provided at each phase. Previous works mainly focus on mitigating forgetting in phases after the initial one. However, we find that improving CIL at its initial phase is also a promising direction. Specifically, we experimentally show that directly encouraging CIL Learner at the initial phase to output similar representations as the model jointly trained on all classes can greatly boost the CIL performance. Motivated by this, we study the difference between a naïvely-trained initial-phase model and the oracle model. Specifically, since one major difference between these two models is the number of training classes, we investigate how such difference affects the model representations. We find that, with fewer training classes, the data representations of each class lie in a long and narrow region; with more training classes, the representations of each class scatter more uniformly. Inspired by this observation, we propose Class-wise Decorrelation (CwD) that effectively regularizes representations of each class to scatter more uniformly, thus mimicking the model jointly trained with all classes (i.e., the oracle model). Our CwD is simple to implement and easy to plug into existing methods. Extensive experiments on various benchmark datasets show that CwD consistently and significantly improves the performance of existing state-of-the-art methods by around 1% to 3%. Code will be released.

updated: Thu Dec 09 2021 07:20:32 GMT+0000 (UTC)

published: Thu Dec 09 2021 07:20:32 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト