CLIP: Train Faster with Less Data

Muhammad Asif Khan; Ridha Hamila; Hamid Menouar

CLIP: 少ないデータでより速くトレーニング

ディープラーニングモデルのトレーニングには、膨大な量のデータが必要です。しかし、最近では機械学習がモデル中心からデータ中心のアプローチに移行しています。データ中心のアプローチでは、モデルアーキテクチャを再設計するのではなく、モデルの学習パフォーマンスを向上させるために、データの品質を改善および改善することに重点が置かれます。この論文では、CLIP、すなわち繰り返しデータの剪定を伴うカリキュラム学習を提案します。 CLIP は、2 つのデータ中心のアプローチ、つまりカリキュラム学習とデータセットの刈り込みを組み合わせて、モデルの学習精度と収束速度を向上させます。提案されたスキームは、損失を認識したデータセットの刈り込みを適用して、最も重要でないサンプルを繰り返し削除し、カリキュラム学習トレーニングで有効なデータセットのサイズを徐々に減らします。群衆密度推定モデルで実行された広範な実験により、収束時間を短縮し、一般化を改善することにより、2 つのアプローチを組み合わせる背後にある概念が検証されます。私たちの知る限り、カリキュラム学習に組み込まれたプロセスとしてのデータの刈り込みのアイデアは斬新です。

Deep learning models require an enormous amount of data for training. However, recently there is a shift in machine learning from model-centric to data-centric approaches. In data-centric approaches, the focus is to refine and improve the quality of the data to improve the learning performance of the models rather than redesigning model architectures. In this paper, we propose CLIP i.e., Curriculum Learning with Iterative data Pruning. CLIP combines two data-centric approaches i.e., curriculum learning and dataset pruning to improve the model learning accuracy and convergence speed. The proposed scheme applies loss-aware dataset pruning to iteratively remove the least significant samples and progressively reduces the size of the effective dataset in the curriculum learning training. Extensive experiments performed on crowd density estimation models validate the notion behind combining the two approaches by reducing the convergence time and improving generalization. To our knowledge, the idea of data pruning as an embedded process in curriculum learning is novel.

updated: Mon Jul 17 2023 09:07:25 GMT+0000 (UTC)

published: Fri Dec 02 2022 21:29:48 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト