Vision Learners Meet Web Image-Text Pairs

Bingchen Zhao; Quan Cui; Hao Wu; Osamu Yoshie; Cheng Yang; Oisin Mac Aodha

視覚学習者が Web 画像とテキストのペアに出会う

最新の自己教師あり学習方法は、厳選された ImageNet-1K データセットで事前トレーニングされています。この作業では、Web データの優れたスケーラビリティを考慮して、ノイズの多い Web ソースの画像とテキストのペアデータに対する自己教師ありの事前トレーニングを検討します。まず、同様の設定で大規模な Web データに対する代表的な自己教師あり事前トレーニング方法のベンチマーク調査を実施します。マスクされたトレーニング目標を使用する単一モーダルのものと、画像テキストの制約的トレーニングを使用するマルチモーダルのものを含む、さまざまな方法を比較します。既存のマルチモーダルメソッドは、ビジョントランスファー学習タスクでシングルモーダルメソッドよりも優れていないことがわかります。これらのベンチマーク結果を説明するための情報理論的見解を導き出し、新しい視覚学習者を設計する方法についての洞察を提供します。この洞察に着想を得て、スケーラブルな Web ソースの画像テキストデータから学習する、新しい視覚的表現の事前トレーニング方法である MUlti-modal Generator~(MUG) を紹介します。 MUG は、さまざまなタスクで最先端の転送パフォーマンスを実現し、有望なスケーリングプロパティを示します。事前トレーニング済みのモデルとコードは、承認後に公開されます。

Most recent self-supervised learning methods are pre-trained on the well-curated ImageNet-1K dataset. In this work, given the excellent scalability of web data, we consider self-supervised pre-training on noisy web sourced image-text paired data. First, we conduct a benchmark study of representative self-supervised pre-training methods on large-scale web data in a like-for-like setting. We compare a range of methods, including single-modal ones that use masked training objectives and multi-modal ones that use image-text constrastive training. We observe that existing multi-modal methods do not outperform their single-modal counterparts on vision transfer learning tasks. We derive an information-theoretical view to explain these benchmark results, which provides insight into how to design a novel vision learner. Inspired by this insight, we present a new visual representation pre-training method, MUlti-modal Generator~(MUG), that learns from scalable web sourced image-text data. MUG achieves state-of-the-art transfer performance on a variety of tasks and demonstrates promising scaling properties. Pre-trained models and code will be made public upon acceptance.

updated: Wed Apr 05 2023 16:22:17 GMT+0000 (UTC)

published: Tue Jan 17 2023 18:53:24 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト