Knowledge Distillation as Efficient Pre-training: Faster Convergence, Higher Data-efficiency, and Better Transferability

Ruifei He; Shuyang Sun; Jihan Yang; Song Bai; Xiaojuan Qi

効率的な事前トレーニングとしての知識の抽出：より高速な収束、より高いデータ効率、およびより優れた転送可能性

大規模な事前トレーニングは、さまざまなコンピュータビジョンタスクにとって重要であることが証明されています。ただし、事前トレーニングデータの量、モデルアーキテクチャの量、およびプライベート/アクセスできないデータの増加に伴い、大規模なデータセットですべてのモデルアーキテクチャを事前トレーニングすることはあまり効率的ではなく、不可能です。この作業では、事前トレーニングの代替戦略、つまり効率的な事前トレーニングとしての知識蒸留（KDEP）を調査し、学習した機能表現を既存の事前トレーニング済みモデルから将来のダウンストリームタスク用の新しい学生モデルに効率的に転送することを目的としています。既存のKnowledgeDistillation（KD）メソッドは、通常、ダウンストリームタスクに転送されるときに破棄されるロジットを蒸留するため、事前トレーニングには不適切であることがわかります。この問題を解決するために、ノンパラメトリックな特徴次元の位置合わせを使用した特徴ベースのKD法を提案します。特に、私たちの方法は、3つのダウンストリームタスクと9つのダウンストリームデータセットで、監視対象の事前トレーニングの対応物と同等に実行され、必要なデータは10分の1、事前トレーニング時間は5分の1です。コードはhttps://github.com/CVMI-Lab/KDEPで入手できます。

Large-scale pre-training has been proven to be crucial for various computer vision tasks. However, with the increase of pre-training data amount, model architecture amount, and the private/inaccessible data, it is not very efficient or possible to pre-train all the model architectures on large-scale datasets. In this work, we investigate an alternative strategy for pre-training, namely Knowledge Distillation as Efficient Pre-training (KDEP), aiming to efficiently transfer the learned feature representation from existing pre-trained models to new student models for future downstream tasks. We observe that existing Knowledge Distillation (KD) methods are unsuitable towards pre-training since they normally distill the logits that are going to be discarded when transferred to downstream tasks. To resolve this problem, we propose a feature-based KD method with non-parametric feature dimension aligning. Notably, our method performs comparably with supervised pre-training counterparts in 3 downstream tasks and 9 downstream datasets requiring 10x less data and 5x less pre-training time. Code is available at https://github.com/CVMI-Lab/KDEP.

updated: Thu Mar 10 2022 06:23:41 GMT+0000 (UTC)

published: Thu Mar 10 2022 06:23:41 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト