Distilling from Similar Tasks for Transfer Learning on a Budget

Kenneth Borup; Cheng Perng Phoo; Bharath Hariharan

予算内での転移学習のための類似タスクからの抽出

限られたラベルで効率的かつ正確な認識システムを取得するという課題に取り組んでいます。認識モデルはモデルのサイズとデータ量によって改善されますが、コンピュータービジョンの多くの特殊なアプリケーションでは、トレーニングと推論の両方でリソースに厳しい制約があります。転移学習は、少数のラベルを使用したトレーニングの効果的なソリューションですが、多くの場合、大規模な基本モデルの計算コストの高い微調整が犠牲になります。一連の多様なソースモデルからの半教師付きクロスドメイン蒸留により、計算と精度の間のこの不愉快なトレードオフを軽減することを提案します。最初に、タスク類似性メトリックを使用して、抽出する適切なソースモデルを 1 つ選択する方法と、ターゲットモデルのダウンストリームパフォーマンスを向上させるには、適切な選択プロセスが不可欠であることを示します。このアプローチを DistillNearest と呼びます。 DistillNearest は効果的ではありますが、単一のソースモデルがターゲットタスクに一致することを前提としていますが、常にそうとは限りません。これを軽減するために、ターゲットタスクとの関連性によって重み付けされた異なるドメインでトレーニングされた複数のソースモデルを単一の効率的なモデル (DistillWeighted という名前) に抽出する、重み付けされたマルチソース蒸留法を提案します。メソッドはソースデータへのアクセスを必要とせず、ソースモデルの機能と疑似ラベルのみが必要です。目標が計算上の制約下での正確な認識である場合、DistillNearest と DistillWeighted の両方のアプローチは、強力な ImageNet 初期化からの転移学習と、FixMatch などの最先端の半教師あり手法の両方よりも優れています。 8 つの多様なターゲットタスクを平均すると、マルチソース手法はベースラインよりもそれぞれ 5.6% ポイントと 4.5% ポイント優れています。

We address the challenge of getting efficient yet accurate recognition systems with limited labels. While recognition models improve with model size and amount of data, many specialized applications of computer vision have severe resource constraints both during training and inference. Transfer learning is an effective solution for training with few labels, however often at the expense of a computationally costly fine-tuning of large base models. We propose to mitigate this unpleasant trade-off between compute and accuracy via semi-supervised cross-domain distillation from a set of diverse source models. Initially, we show how to use task similarity metrics to select a single suitable source model to distill from, and that a good selection process is imperative for good downstream performance of a target model. We dub this approach DistillNearest. Though effective, DistillNearest assumes a single source model matches the target task, which is not always the case. To alleviate this, we propose a weighted multi-source distillation method to distill multiple source models trained on different domains weighted by their relevance for the target task into a single efficient model (named DistillWeighted). Our methods need no access to source data, and merely need features and pseudo-labels of the source models. When the goal is accurate recognition under computational constraints, both DistillNearest and DistillWeighted approaches outperform both transfer learning from strong ImageNet initializations as well as state-of-the-art semi-supervised techniques such as FixMatch. Averaged over 8 diverse target tasks our multi-source method outperforms the baselines by 5.6%-points and 4.5%-points, respectively.

updated: Mon Apr 24 2023 17:59:01 GMT+0000 (UTC)

published: Mon Apr 24 2023 17:59:01 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト