Do Pre-trained Models Benefit Equally in Continual Learning?

Kuan-Ying Lee; Yuanyi Zhong; Yu-Xiong Wang

事前訓練されたモデルは、継続的学習において等しく恩恵を受けるか?

継続的学習 (CL) に関する既存の作業は、主にゼロからトレーニングされたモデルのアルゴリズムの開発に専念しています。これらのアルゴリズムは、不自然なベンチマークで有望なパフォーマンスを発揮しますが、実際のシナリオではパフォーマンスが大幅に低下します。したがって、このホワイトペーパーでは、CL への事前トレーニングの体系的な導入を提唱します。これは、下流のタスクに知識を伝達するための一般的なレシピですが、CL コミュニティには実質的に欠けています。私たちの調査では、事前トレーニング済みモデル、CL アルゴリズム、および CL シナリオという 3 つの異なる軸に沿って、CL の事前トレーニング済みモデルを活用することの多面的な複雑さが明らかになりました。おそらく最も興味深いのは、事前トレーニングによる CL アルゴリズムの改善は非常に一貫性がないことです。すべてのアルゴリズムが事前トレーニング済みのモデルから開始されると、性能の劣るアルゴリズムが競争力を持ち、最先端のアルゴリズムにさえなる可能性があります。これは、すべての CL メソッドがゼロからのトレーニングで比較される現在のパラダイムが、真の CL の目的と望ましい進歩を十分に反映していないことを示しています。さらに、正則化が少ない CL アルゴリズムは、事前トレーニング済みモデルからより多くの利益を得ることを含め、いくつかの重要な観察を行っています。また、CLIP などの強力な事前トレーニング済みモデルは、より良い改善を保証するものではありません。これらの調査結果に基づいて、最小限の正則化を採用し、2 段階のトレーニングパイプラインと組み合わせて、より有益な事前トレーニング済みモデルを活用するシンプルで効果的なベースラインを紹介します。最先端のパフォーマンスが実証されているため、CL アルゴリズムの将来の開発にこの強力なベースラインを含めることをお勧めします。

Existing work on continual learning (CL) is primarily devoted to developing algorithms for models trained from scratch. Despite their encouraging performance on contrived benchmarks, these algorithms show dramatic performance drops in real-world scenarios. Therefore, this paper advocates the systematic introduction of pre-training to CL, which is a general recipe for transferring knowledge to downstream tasks but is substantially missing in the CL community. Our investigation reveals the multifaceted complexity of exploiting pre-trained models for CL, along three different axes, pre-trained models, CL algorithms, and CL scenarios. Perhaps most intriguingly, improvements in CL algorithms from pre-training are very inconsistent an underperforming algorithm could become competitive and even state-of-the-art when all algorithms start from a pre-trained model. This indicates that the current paradigm, where all CL methods are compared in from-scratch training, is not well reflective of the true CL objective and desired progress. In addition, we make several other important observations, including that CL algorithms that exert less regularization benefit more from a pre-trained model; and that a stronger pre-trained model such as CLIP does not guarantee a better improvement. Based on these findings, we introduce a simple yet effective baseline that employs minimum regularization and leverages the more beneficial pre-trained model, coupled with a two-stage training pipeline. We recommend including this strong baseline in the future development of CL algorithms, due to its demonstrated state-of-the-art performance.

updated: Thu Oct 27 2022 18:03:37 GMT+0000 (UTC)

published: Thu Oct 27 2022 18:03:37 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト