Pre-training Vision Transformers with Very Limited Synthesized Images

Ryo Nakamura; Hirokatsu Kataoka; Sora Takashima; Edgar Josafat Martinez Noriega; Rio Yokota; Nakamasa Inoue

非常に限られた合成画像を使用したビジョントランスフォーマーの事前トレーニング

数式駆動教師あり学習 (FDSL) は、フラクタルなどの数式から生成された合成画像に依存する事前トレーニング方法です。 FDSL に関するこれまでの研究では、このような合成データセットでビジョントランスフォーマーを事前トレーニングすると、幅広い下流タスクで競争力のある精度が得られることが示されています。これらの合成画像は、それらを生成する数式のパラメータに従って分類されます。現在の研究では、FDSL で同じカテゴリに対して異なるインスタンスを生成するプロセスは、データ拡張の一形態として見なすことができるという仮説を立てています。この仮説を検証するには、インスタンスをデータ拡張で置き換えます。これは、カテゴリごとに 1 つの画像だけが必要であることを意味します。私たちの実験では、この 1 インスタンスのフラクタルデータベース (OFDB) が、インスタンスが明示的に生成された元のデータセットよりも優れたパフォーマンスを発揮することを示しています。 OFDB をさらに 21,000 カテゴリにスケールアップし、ImageNet-1k 微調整で ImageNet-21k で事前トレーニングされたモデルと一致するか、それを上回ることを示します。 OFDB の画像数は 21k ですが、ImageNet-21k の画像数は 14M です。これにより、はるかに小さいデータセットを使用してビジョントランスフォーマーを事前トレーニングする新たな可能性が開かれます。

Formula-driven supervised learning (FDSL) is a pre-training method that relies on synthetic images generated from mathematical formulae such as fractals. Prior work on FDSL has shown that pre-training vision transformers on such synthetic datasets can yield competitive accuracy on a wide range of downstream tasks. These synthetic images are categorized according to the parameters in the mathematical formula that generate them. In the present work, we hypothesize that the process for generating different instances for the same category in FDSL, can be viewed as a form of data augmentation. We validate this hypothesis by replacing the instances with data augmentation, which means we only need a single image per category. Our experiments shows that this one-instance fractal database (OFDB) performs better than the original dataset where instances were explicitly generated. We further scale up OFDB to 21,000 categories and show that it matches, or even surpasses, the model pre-trained on ImageNet-21k in ImageNet-1k fine-tuning. The number of images in OFDB is 21k, whereas ImageNet-21k has 14M. This opens new possibilities for pre-training vision transformers with much smaller datasets.

updated: Mon Jul 31 2023 01:06:05 GMT+0000 (UTC)

published: Thu Jul 27 2023 08:58:39 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト