Where Should I Spend My FLOPS? Efficiency Evaluations of Visual Pre-training Methods

Skanda Koppula; Yazhe Li; Evan Shelhamer; Andrew Jaegle; Nikhil Parthasarathy; Relja Arandjelovic; João Carreira; Olivier Hénaff

FLOPS をどこに使うべきですか?視覚的な事前トレーニング方法の効率評価

自己教師あり手法は、転移学習において目覚ましい成功を収めており、多くの場合、教師ありの事前トレーニングと同じかそれ以上の精度を達成しています。これまでのほとんどの研究では、複雑なデータ拡張、複数のビュー、または長いトレーニングスケジュールを追加することで、トレーニング前の計算を増やすことでこれを行ってきました。この作業では、関連するが直交する質問を調査します。固定の FLOP 予算が与えられた場合、代表的な視覚タスクで高い精度を得るのに最適なデータセット、モデル、および (自己) 教師付きトレーニング方法は何ですか?大規模なデータセットが利用可能であることを考えると、この設定は多くの場合、学術および業界のラボの両方にとってより適切です。 5 つの大規模データセット (JFT-300M、ALIGN、ImageNet-1K、ImageNet-21K、および COCO) と 6 つの事前トレーニング方法 (CLIP、DINO、SimCLR、BYOL、Masked Autoencoding、および監視あり) を調べます。同様の方法で、標準的な画像セグメンテーションタスクに転送されたときの精度と比較して、FLOP と CO_2 のフットプリントを特徴付けます。私たちの分析は、事前トレーニング方法の計算効率とデータセットの品質への依存性に大きな格差があることを明らかにしています。特に、私たちの結果は、自己教師ありメソッドが本質的に大規模でキュレーションされていないデータにスケーリングするという一般的に保持されている仮定に疑問を投げかけています。したがって、(1) データセットのキュレーションに細心の注意を払い、(2) 総計算コストに照らして精度を報告することをお勧めします。

Self-supervised methods have achieved remarkable success in transfer learning, often achieving the same or better accuracy than supervised pre-training. Most prior work has done so by increasing pre-training computation by adding complex data augmentation, multiple views, or lengthy training schedules. In this work, we investigate a related, but orthogonal question: given a fixed FLOP budget, what are the best datasets, models, and (self-)supervised training methods for obtaining high accuracy on representative visual tasks? Given the availability of large datasets, this setting is often more relevant for both academic and industry labs alike. We examine five large-scale datasets (JFT-300M, ALIGN, ImageNet-1K, ImageNet-21K, and COCO) and six pre-training methods (CLIP, DINO, SimCLR, BYOL, Masked Autoencoding, and supervised). In a like-for-like fashion, we characterize their FLOP and CO_2 footprints, relative to their accuracy when transferred to a canonical image segmentation task. Our analysis reveals strong disparities in the computational efficiency of pre-training methods and their dependence on dataset quality. In particular, our results call into question the commonly-held assumption that self-supervised methods inherently scale to large, uncurated data. We therefore advocate for (1) paying closer attention to dataset curation and (2) reporting of accuracies in context of the total computational cost.

updated: Tue Oct 18 2022 21:46:25 GMT+0000 (UTC)

published: Fri Sep 30 2022 17:04:55 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト