Beyond neural scaling laws: beating power law scaling via data pruning

Ben Sorscher; Robert Geirhos; Shashank Shekhar; Surya Ganguli; Ari S. Morcos

ニューラルスケーリングの法則を超えて: データプルーニングによるビートパワーの法則のスケーリング

トレーニングセットのサイズ、モデルのサイズ、またはその両方の累乗に応じてエラーが減少する、広く観察されているニューラルスケーリングの法則により、ディープラーニングのパフォーマンスが大幅に向上しました。ただし、スケーリングのみによるこれらの改善には、コンピューティングとエネルギーにかなりのコストがかかります。ここでは、データセットサイズによるエラーのスケーリングに焦点を当て、理論的にはべき法則スケーリングを超えて、潜在的に指数スケーリングにまで削減できることを示します。プルーニングされたデータセットサイズを達成するには、トレーニング例を破棄する必要があります。次に、この改善されたスケーリング予測を剪定されたデータセットサイズで経験的にテストし、CIFAR-10、SVHN、および ImageNet でトレーニングされた ResNets で実際にベキ乗法スケーリングよりも優れていることを観察します。次に、高品質のプルーニングメトリクスを見つけることの重要性を考慮して、ImageNet で 10 の異なるデータプルーニングメトリクスの最初の大規模なベンチマーク調査を実行します。既存の高性能メトリクスのほとんどは ImageNet へのスケーリングが不十分であることがわかりましたが、最高のメトリクスは計算集約的であり、すべての画像にラベルが必要です。したがって、最高の教師ありメトリックに匹敵するパフォーマンスを示す、シンプルで安価でスケーラブルな新しい自己教師ありプルーニングメトリックを開発しました。全体として、私たちの研究は、優れたデータプルーニングメトリクスの発見が、大幅に改善されたニューラルスケーリング法則への実行可能な道を提供し、それによって最新の深層学習のリソースコストを削減する可能性があることを示唆しています.

Widely observed neural scaling laws, in which error falls off as a power of the training set size, model size, or both, have driven substantial performance improvements in deep learning. However, these improvements through scaling alone require considerable costs in compute and energy. Here we focus on the scaling of error with dataset size and show how in theory we can break beyond power law scaling and potentially even reduce it to exponential scaling instead if we have access to a high-quality data pruning metric that ranks the order in which training examples should be discarded to achieve any pruned dataset size. We then test this improved scaling prediction with pruned dataset size empirically, and indeed observe better than power law scaling in practice on ResNets trained on CIFAR-10, SVHN, and ImageNet. Next, given the importance of finding high-quality pruning metrics, we perform the first large-scale benchmarking study of ten different data pruning metrics on ImageNet. We find most existing high performing metrics scale poorly to ImageNet, while the best are computationally intensive and require labels for every image. We therefore developed a new simple, cheap and scalable self-supervised pruning metric that demonstrates comparable performance to the best supervised metrics. Overall, our work suggests that the discovery of good data-pruning metrics may provide a viable path forward to substantially improved neural scaling laws, thereby reducing the resource costs of modern deep learning.

updated: Fri Apr 21 2023 20:14:07 GMT+0000 (UTC)

published: Wed Jun 29 2022 09:20:47 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト