BossNAS: Exploring Hybrid CNN-transformers with Block-wisely Self-supervised Neural Architecture Search

Changlin Li; Tao Tang; Guangrun Wang; Jiefeng Peng; Bing Wang; Xiaodan Liang; Xiaojun Chang

BossNAS：ブロックごとに自己監視されたニューラルアーキテクチャ検索を使用したハイブリッドCNNトランスフォーマーの探索

視覚認識のための手作りのニューラルアーキテクチャにおける最近の無数のブレークスルーは、多様なビルディングブロックからなるハイブリッドアーキテクチャを探索する緊急の必要性を浮き彫りにしました。一方、ニューラルアーキテクチャの検索方法は、人間の労力を削減することを期待して急増しています。ただし、NASメソッドが異なる候補（CNNやトランスフォーマーなど）を持つ多様な検索スペースを効率的かつ効果的に処理できるかどうかは、依然として未解決の問題です。この作業では、ブロックワイズ自己教師ありニューラルアーキテクチャ検索（BossNAS）を紹介します。これは、以前の方法での大きな重み共有スペースと偏った監視によって引き起こされる不正確なアーキテクチャ評価の問題に対処する教師なしNAS方法です。より具体的には、検索スペースをブロックに分解し、アンサンブルブートストラップと呼ばれる新しい自己教師ありトレーニングスキームを利用して、人口中心に向かって全体を検索する前に、各ブロックを個別にトレーニングします。さらに、検索可能なダウンサンプリング位置を備えたファブリックのようなハイブリッドCNN-トランスフォーマー検索スペースであるHyTra検索スペースを紹介します。この困難な検索空間で、検索モデルであるBossNet-Tは、ImageNetで最大82.2％の精度を達成し、同等の計算時間でEfficientNetを2.1％上回ります。さらに、私たちの方法は、ImageNetを使用した正規MBConv検索スペースとCIFAR-100を使用したNATS-ベンチサイズ検索スペースで、それぞれ0.78と0.76のスピアマン相関で優れたアーキテクチャ評価精度を達成し、最先端のNAS方法を上回ります。コードと事前トレーニング済みモデルは、https：//github.com/changlin31/BossNASで入手できます。

A myriad of recent breakthroughs in hand-crafted neural architectures for visual recognition have highlighted the urgent need to explore hybrid architectures consisting of diversified building blocks. Meanwhile, neural architecture search methods are surging with an expectation to reduce human efforts. However, whether NAS methods can efficiently and effectively handle diversified search spaces with disparate candidates (e.g. CNNs and transformers) is still an open question. In this work, we present Block-wisely Self-supervised Neural Architecture Search (BossNAS), an unsupervised NAS method that addresses the problem of inaccurate architecture rating caused by large weight-sharing space and biased supervision in previous methods. More specifically, we factorize the search space into blocks and utilize a novel self-supervised training scheme, named ensemble bootstrapping, to train each block separately before searching them as a whole towards the population center. Additionally, we present HyTra search space, a fabric-like hybrid CNN-transformer search space with searchable down-sampling positions. On this challenging search space, our searched model, BossNet-T, achieves up to 82.2% accuracy on ImageNet, surpassing EfficientNet by 2.1% with comparable compute time. Moreover, our method achieves superior architecture rating accuracy with 0.78 and 0.76 Spearman correlation on the canonical MBConv search space with ImageNet and on NATS-Bench size search space with CIFAR-100, respectively, surpassing state-of-the-art NAS methods. Code and pretrained models are available at https://github.com/changlin31/BossNAS .

updated: Wed Mar 24 2021 16:35:30 GMT+0000 (UTC)

published: Tue Mar 23 2021 10:05:58 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト