Effect of large-scale pre-training on full and few-shot transfer learning for natural and medical images

Mehdi Cherti; Jenia Jitsev

自然画像と医用画像のフルショットと少数ショットの転移学習に対する大規模な事前トレーニングの影響

転移学習は、事前にトレーニングされたモデルを活用して、幅広いダウンストリームタスクとデータセットでより効率的なフォローアップトレーニングを行い、小さなデータでもトレーニングを成功させることを目的としています。最近の一連の作業では、事前トレーニングのためにモデルサイズ、データサイズ、およびコンピューティングバジェットが増加した場合に、モデルの一般化と転送に強力な利点があると想定しています。ただし、ソースデータとターゲットデータの分散が互いに離れている場合にも、スケールの増加による転送の改善が見られるかどうかは、まだほとんど不明のままです。この作業では、自然 (ImageNet-21k/1k) または医療胸部 X 線画像の大規模なソースデータセットで大規模な事前トレーニングを実施し、自然画像と医用画像の両方からの異なるターゲットデータセットを使用して、フルショット転送と少数ショット転送を比較します。ドメイン。私たちの観察は、密接に関連するデータセットでの事前トレーニングと転送は事前トレーニング中にモデルとデータサイズを増やすことの明確な利点を示していますが、ソースとターゲットのデータセットがさらに離れている場合、そのような利点は明確に見えないという証拠を提供します。これらの観察は、フルショット転送と少数ショット転送の両方に当てはまり、モデルとデータサイズの増加に伴う一般化と転送の改善を示すスケーリング則は不完全であり、ソースデータとターゲットデータのタイプと近接性を考慮して修正する必要があることを示しています。転送時の事前トレーニング中にモデルとデータスケールの効果を正しく予測するため。注目すべきことに、大きな胸部 X 線画像化ターゲット (PadChest) へのフルショット転送では、ImageNet-21k で事前トレーニングされた最大のモデルは、大きな X 線胸部画像データで事前トレーニングされた最高のモデルをわずかに上回っています。これは、比較的非常に大きな一般的なソースデータで事前トレーニングを行うことで、大きなドメイン固有データにアクセスしなくても、ドメイン固有転送の高品質モデルを取得できる可能性があることを示しています。

Transfer learning aims to exploit pre-trained models for more efficient follow-up training on wide range of downstream tasks and datasets, enabling successful training also on small data. Recent line of work posits strong benefits for model generalization and transfer when model size, data size, and compute budget are increased for the pre-training. It remains however still largely unclear whether the observed transfer improvement due to increase in scale also holds when source and target data distributions are far apart from each other. In this work we conduct large-scale pre-training on large source datasets of either natural (ImageNet-21k/1k) or medical chest X-Ray images and compare full and few-shot transfer using different target datasets from both natural and medical imaging domains. Our observations provide evidence that while pre-training and transfer on closely related datasets do show clear benefit of increasing model and data size during pre-training, such benefits are not clearly visible when source and target datasets are further apart. These observations hold across both full and few-shot transfer and indicate that scaling laws pointing to improvement of generalization and transfer with increasing model and data size are incomplete and should be revised by taking into account the type and proximity of the source and target data, to correctly predict the effect of model and data scale during pre-training on transfer. Remarkably, in full shot transfer to a large X-Ray chest imaging target (PadChest), the largest model pre-trained on ImageNet-21k slightly outperforms best models pre-trained on large X-Ray chest imaging data. This indicates possibility to obtain high quality models for domain-specific transfer even without access to large domain-specific data, by pre-training instead on comparably very large, generic source data.

updated: Wed Jun 09 2021 15:33:51 GMT+0000 (UTC)

published: Mon May 31 2021 21:55:56 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト