Vision Learners Meet Web Image-Text Pairs

Bingchen Zhao; Quan Cui; Hao Wu; Osamu Yoshie; Cheng Yang

視覚学習者が Web 画像とテキストのペアに出会う

最新の自己教師あり学習 ~ (SSL) メソッドは、厳選された ImageNet-1K データセットで事前にトレーニングされています。この作業では、Web データの優れたスケーラビリティにより、ノイズの多い Web 画像とテキストのペアデータに対する SSL 事前トレーニングを検討します。まず、代表的なSSL事前学習手法について、大規模なWebデータを公平な状態でベンチマーク調査を行います。メソッドには、MAE などのシングルモーダルのものと、CLIP などのマルチモーダルのものがあります。視覚転移学習タスクでは、マルチモーダルメソッドがシングルモーダルメソッドよりも優れていることはありません。ベンチマーク結果を説明するための情報理論的見解を導き出し、新しい視覚学習者を設計するための洞察を提供します。上記の調査に着想を得て、スケーラブルな Web 画像テキストデータ用の視覚的表現の事前トレーニング方法である MUlti-modal Generator~(MUG) を提示します。 MUG は、さまざまなタスクで最先端の転送パフォーマンスを達成し、有望なスケーリング動作を示します。モデルとコードは公開されます。デモは https://huggingface.co/spaces/tennant/MUG_caption で入手可能

Most recent self-supervised learning~(SSL) methods are pre-trained on the well-curated ImageNet-1K dataset. In this work, we consider SSL pre-training on noisy web image-text paired data due to the excellent scalability of web data. First, we conduct a benchmark study of representative SSL pre-training methods on large-scale web data in a fair condition. Methods include single-modal ones such as MAE and multi-modal ones such as CLIP. We observe that multi-modal methods cannot outperform single-modal ones on vision transfer learning tasks. We derive an information-theoretical view to explain the benchmarking results, which provides insights into designing novel vision learners. Inspired by the above explorations, we present a visual representation pre-training method, MUlti-modal Generator~(MUG), for scalable web image-text data. MUG achieves state-of-the-art transferring performances on a variety of tasks and shows promising scaling behavior. Models and codes will be made public. Demo available at https://huggingface.co/spaces/tennant/MUG_caption

updated: Tue Jan 17 2023 18:53:24 GMT+0000 (UTC)

published: Tue Jan 17 2023 18:53:24 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト