AutoFormer: Searching Transformers for Visual Recognition

Minghao Chen; Houwen Peng; Jianlong Fu; Haibin Ling

AutoFormer：視覚認識のためのトランスフォーマーの検索

最近、純粋なトランスベースのモデルは、画像の分類や検出などの視覚タスクに大きな可能性を示しています。ただし、トランスネットワークの設計は困難です。深さ、埋め込み寸法、およびヘッドの数は、ビジョントランスフォーマーのパフォーマンスに大きく影響する可能性があることが観察されています。以前のモデルは、手動による作成に基づいてこれらの寸法を構成していました。この作業では、ビジョントランスフォーマー検索専用の新しいワンショットアーキテクチャ検索フレームワーク、つまりAutoFormerを提案します。 AutoFormerは、スーパーネットのトレーニング中に同じレイヤー内の異なるブロックの重みを絡ませます。この戦略の恩恵を受けて、トレーニングされたスーパーネットは、何千ものサブネットを非常によくトレーニングすることを可能にします。具体的には、スーパーネットから継承された重みを持つこれらのサブネットのパフォーマンスは、最初から再トレーニングされたものに匹敵します。さらに、AutoFormersと呼ばれる検索モデルは、ViTやDeiTなどの最近の最先端技術を上回っています。特に、AutoFormer-tiny / small / baseは、それぞれ5.7M / 22.9M / 53.7Mのパラメーターを使用してImageNetで74.7％/ 81.7％/ 82.4％のトップ1精度を達成します。最後に、ダウンストリームベンチマークと蒸留実験でのパフォーマンスを提供することにより、AutoFormerの転送可能性を検証します。コードとモデルはhttps://github.com/microsoft/AutoMLで入手できます。

Recently, pure transformer-based models have shown great potentials for vision tasks such as image classification and detection. However, the design of transformer networks is challenging. It has been observed that the depth, embedding dimension, and number of heads can largely affect the performance of vision transformers. Previous models configure these dimensions based upon manual crafting. In this work, we propose a new one-shot architecture search framework, namely AutoFormer, dedicated to vision transformer search. AutoFormer entangles the weights of different blocks in the same layers during supernet training. Benefiting from the strategy, the trained supernet allows thousands of subnets to be very well-trained. Specifically, the performance of these subnets with weights inherited from the supernet is comparable to those retrained from scratch. Besides, the searched models, which we refer to AutoFormers, surpass the recent state-of-the-arts such as ViT and DeiT. In particular, AutoFormer-tiny/small/base achieve 74.7%/81.7%/82.4% top-1 accuracy on ImageNet with 5.7M/22.9M/53.7M parameters, respectively. Lastly, we verify the transferability of AutoFormer by providing the performance on downstream benchmarks and distillation experiments. Code and models are available at https://github.com/microsoft/AutoML.

updated: Thu Jul 01 2021 17:59:30 GMT+0000 (UTC)

published: Thu Jul 01 2021 17:59:30 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト