Data-Efficient Training of CNNs and Transformers with Coresets: A Stability Perspective

Animesh Gupta; Irtiza Hassan; Dilip K. Prasad; Deepak K. Gupta

コアセットを使用した CNN とトランスフォーマーのデータ効率の高いトレーニング: 安定性の観点

コアセットの選択は、CNN のトレーニング時間を短縮するための最も効果的な方法の 1 つですが、コアセットのサイズの変化や、データセットとモデルの選択の下で結果のモデルがどのように動作するかについては、限られたことしかわかっていません。さらに、最近の変圧器ベースのモデルへのパラダイムシフトを考えると、コアセットの選択がパフォーマンスにどのように影響するかは未解決の問題です。コアセットの選択方法が広く受け入れられるためには、いくつかの同様の興味深い質問に答える必要があります。体系的なベンチマーク設定を提示し、CNN とトランスフォーマーでさまざまなコアセット選択方法を厳密に比較します。私たちの調査では、特定の状況下では、SOTA 選択方法と比較して、サブセットのランダム選択がより堅牢で安定していることが明らかになりました。データのさまざまなクラスにわたる均一なサブセットサンプリングの従来の概念が適切な選択ではないことを示します。むしろ、各クラスのデータ分布の複雑さに基づいて、サンプルを適応的に選択する必要があります。トランスフォーマーは通常、大規模なデータセットで事前トレーニングされており、特定のターゲットデータセットについては、非常に小さなコアセットサイズでもパフォーマンスを安定に保つのに役立つことが示されています。さらに、事前トレーニングが行われていない場合、または事前トレーニング済みの変換モデルが非自然画像 (医療データなど) で使用されている場合、CNN は非常に小さなコアセットサイズでも変換器よりも一般化する傾向があることを示しています。最後に、適切な事前トレーニングがない場合、CNN は画像内の空間的に離れたオブジェクト間のセマンティックコヒーレンスの学習に優れており、これらはほとんどすべてのコアセットサイズの選択でトランスフォーマーよりも優れている傾向があることを示しています。

Coreset selection is among the most effective ways to reduce the training time of CNNs, however, only limited is known on how the resultant models will behave under variations of the coreset size, and choice of datasets and models. Moreover, given the recent paradigm shift towards transformer-based models, it is still an open question how coreset selection would impact their performance. There are several similar intriguing questions that need to be answered for a wide acceptance of coreset selection methods, and this paper attempts to answer some of these. We present a systematic benchmarking setup and perform a rigorous comparison of different coreset selection methods on CNNs and transformers. Our investigation reveals that under certain circumstances, random selection of subsets is more robust and stable when compared with the SOTA selection methods. We demonstrate that the conventional concept of uniform subset sampling across the various classes of the data is not the appropriate choice. Rather samples should be adaptively chosen based on the complexity of the data distribution for each class. Transformers are generally pretrained on large datasets, and we show that for certain target datasets, it helps to keep their performance stable at even very small coreset sizes. We further show that when no pretraining is done or when the pretrained transformer models are used with non-natural images (e.g. medical data), CNNs tend to generalize better than transformers at even very small coreset sizes. Lastly, we demonstrate that in the absence of the right pretraining, CNNs are better at learning the semantic coherence between spatially distant objects within an image, and these tend to outperform transformers at almost all choices of the coreset size.

updated: Fri Mar 03 2023 17:24:39 GMT+0000 (UTC)

published: Fri Mar 03 2023 17:24:39 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト