Escaping the Big Data Paradigm with Compact Transformers

Ali Hassani; Steven Walton; Nikhil Shah; Abulikemu Abuduweili; Jiachen Li; Humphrey Shi

コンパクトトランスフォーマーでビッグデータパラダイムを回避

言語処理の標準としてのトランスフォーマーの台頭と、コンピュータービジョンの進歩に伴い、パラメーターのサイズとトレーニングデータの量もそれに応じて増加しています。このため、トランスフォーマーは小さなデータセットには適していないという多くの人が信じるようになりました。この傾向は、次のような懸念につながります。特定の科学分野でのデータの利用可能性の制限、およびこの分野の研究からの限られたリソースを持つデータの除外。本稿では、コンパクトトランスフォーマーを導入することにより、小規模な学習へのアプローチを提示することを目指しています。適切なサイズ、畳み込みトークン化により、トランスフォーマーが小さなデータセットでの過剰適合を回避し、最先端のCNNを上回ることができることを初めて示しました。私たちのモデルは、モデルサイズの点で柔軟性があり、競争力のある結果を達成しながら、わずか0.28Mのパラメーターを持つことができます。私たちの最高のモデルは、わずか3.7MのパラメーターでCIFAR-10をゼロからトレーニングするときに、98％の精度に達することができます。これは、他のトランスフォーマーよりも10倍以上小さく、サイズが15％である、以前のトランスフォーマーベースのモデルよりもデータ効率が大幅に向上しています。同様のパフォーマンスを達成しながらResNet50。 CCTは、多くの最新のCNNベースのアプローチ、さらには最近のNASベースのアプローチよりも優れています。さらに、Flowers-102で99.76％のトップ1精度で新しいSOTA結果を取得し、ImageNetの既存のベースライン（ViTの29％のパラメーターで82.71％の精度）とNLPタスクを改善します。変圧器のシンプルでコンパクトな設計により、データ効率の高い変圧器の既存の研究努力を拡張しながら、コンピューティングリソースが限られている人や小さなデータセットを扱う人のために研究することがより実現可能になります。コードと事前トレーニング済みモデルは、https：//github.com/SHI-Labs/Compact-Transformersで公開されています。

With the rise of Transformers as the standard for language processing, and their advancements in computer vision, there has been a corresponding growth in parameter size and amounts of training data. Many have come to believe that because of this, transformers are not suitable for small sets of data. This trend leads to concerns such as: limited availability of data in certain scientific domains and the exclusion of those with limited resource from research in the field. In this paper, we aim to present an approach for small-scale learning by introducing Compact Transformers. We show for the first time that with the right size, convolutional tokenization, transformers can avoid overfitting and outperform state-of-the-art CNNs on small datasets. Our models are flexible in terms of model size, and can have as little as 0.28M parameters while achieving competitive results. Our best model can reach 98% accuracy when training from scratch on CIFAR-10 with only 3.7M parameters, which is a significant improvement in data-efficiency over previous Transformer based models being over 10x smaller than other transformers and is 15% the size of ResNet50 while achieving similar performance. CCT also outperforms many modern CNN based approaches, and even some recent NAS-based approaches. Additionally, we obtain a new SOTA result on Flowers-102 with 99.76% top-1 accuracy, and improve upon the existing baseline on ImageNet (82.71% accuracy with 29% as many parameters as ViT), as well as NLP tasks. Our simple and compact design for transformers makes them more feasible to study for those with limited computing resources and/or dealing with small datasets, while extending existing research efforts in data efficient transformers. Our code and pre-trained models are publicly available at https://github.com/SHI-Labs/Compact-Transformers.

updated: Tue Jun 07 2022 19:25:30 GMT+0000 (UTC)

published: Mon Apr 12 2021 17:58:56 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト