Searching for TrioNet: Combining Convolution with Local and Global Self-Attention

Huaijin Pi; Huiyu Wang; Yingwei Li; Zizhang Li; Alan Yuille

TrioNetの検索：畳み込みとローカルおよびグローバルな自己注意の組み合わせ

最近、自己注意オペレーターは、ビジョンモデルのスタンドアロンビルディングブロックとして優れたパフォーマンスを示しています。ただし、既存の自己注意モデルは、多くの場合、手作業で設計され、CNNから変更され、1人のオペレーターのみをスタックすることによって取得されます。さまざまな自己注意演算子と畳み込みを組み合わせた、より広い範囲のアーキテクチャ空間が探索されることはめったにありません。このホワイトペーパーでは、重みを共有するニューラルアーキテクチャ検索（NAS）アルゴリズムを使用して、この新しいアーキテクチャ空間を探索します。結果のアーキテクチャは、畳み込み、ローカル自己注意、およびグローバル（軸）自己注意演算子を組み合わせたTrioNetと呼ばれます。この巨大なアーキテクチャ空間を効果的に検索するために、スーパーネットのトレーニングを改善するための階層サンプリングを提案します。さらに、特にマルチヘッドの自己注意オペレーター向けに、新しいウェイトシェアリング戦略であるマルチヘッドシェアリングを提案します。自己注意と畳み込みを組み合わせた検索済みのTrioNetは、すべてのスタンドアロンモデルよりも優れており、ImageNet分類ではFLOPが少なく、自己注意のパフォーマンスは畳み込みよりも優れています。さらに、さまざまな小さなデータセットで、自己注意モデルのパフォーマンスが低下することがわかりますが、TrioNetは依然として最良の演算子、この場合は畳み込みに一致することができます。私たちのコードはhttps://github.com/phj128/TrioNetで入手できます。

Recently, self-attention operators have shown superior performance as a stand-alone building block for vision models. However, existing self-attention models are often hand-designed, modified from CNNs, and obtained by stacking one operator only. A wider range of architecture space which combines different self-attention operators and convolution is rarely explored. In this paper, we explore this novel architecture space with weight-sharing Neural Architecture Search (NAS) algorithms. The result architecture is named TrioNet for combining convolution, local self-attention, and global (axial) self-attention operators. In order to effectively search in this huge architecture space, we propose Hierarchical Sampling for better training of the supernet. In addition, we propose a novel weight-sharing strategy, Multi-head Sharing, specifically for multi-head self-attention operators. Our searched TrioNet that combines self-attention and convolution outperforms all stand-alone models with fewer FLOPs on ImageNet classification where self-attention performs better than convolution. Furthermore, on various small datasets, we observe inferior performance for self-attention models, but our TrioNet is still able to match the best operator, convolution in this case. Our code is available at https://github.com/phj128/TrioNet.

updated: Mon Nov 15 2021 05:45:10 GMT+0000 (UTC)

published: Mon Nov 15 2021 05:45:10 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト