ShiftNAS: Improving One-shot NAS via Probability Shift

Mingyang Zhang; Xinyi Yu; Haodong Zhao; Linlin Ou

ShiftNAS: 確率シフトによるワンショット NAS の改善

ワンショットニューラルアーキテクチャ検索 (ワンショット NAS) は、一度だけトレーニングするだけで、さまざまな複雑さのケースの下で最適なサブネットアーキテクチャと重みを取得するための時間効率の高いアプローチとして提案されています。ただし、重み共有によって得られるサブネットのパフォーマンスは、再トレーニングによって得られるパフォーマンスより劣ることがよくあります。この論文では、パフォーマンスのギャップを調査し、そのギャップがスーパーネットのトレーニングで一般的なアプローチである均一サンプリングの使用に起因すると考えます。均一サンプリングでは、中間の計算リソースを備えたサブネットにトレーニングリソースが集中し、高い確率でサンプリングされます。ただし、異なる複雑さの領域を持つサブネットでは、最適なパフォーマンスを得るために、異なる最適なトレーニング戦略が必要になります。均一なサンプリングの問題に対処するために、サブネットの複雑さに基づいてサンプリング確率を調整できる方法である ShiftNAS を提案します。これは、さまざまな複雑さのサブネットのパフォーマンスの変動を評価し、必要な複雑さをサブネットに正確かつ効率的に提供できるアーキテクチャジェネレーターを設計することによって実現されます。サンプリング確率とアーキテクチャジェネレーターは両方とも、勾配ベースの方法でエンドツーエンドでトレーニングできます。 ShiftNAS を使用すると、特定の計算複雑さに対して最適なモデルアーキテクチャとパラメータを直接取得できます。畳み込みニューラルネットワーク (CNN) やビジョントランスフォーマー (ViT) を含む複数のビジュアルネットワークモデルに対するアプローチを評価し、ShiftNAS がモデルに依存しないことを実証します。 ImageNet の実験結果は、ShiftNAS が追加の消費を行わずにワンショット NAS のパフォーマンスを向上できることを示しています。ソースコードは https://github.com/bestfleer/ShiftNAS で入手できます。

One-shot Neural architecture search (One-shot NAS) has been proposed as a time-efficient approach to obtain optimal subnet architectures and weights under different complexity cases by training only once. However, the subnet performance obtained by weight sharing is often inferior to the performance achieved by retraining. In this paper, we investigate the performance gap and attribute it to the use of uniform sampling, which is a common approach in supernet training. Uniform sampling concentrates training resources on subnets with intermediate computational resources, which are sampled with high probability. However, subnets with different complexity regions require different optimal training strategies for optimal performance. To address the problem of uniform sampling, we propose ShiftNAS, a method that can adjust the sampling probability based on the complexity of subnets. We achieve this by evaluating the performance variation of subnets with different complexity and designing an architecture generator that can accurately and efficiently provide subnets with the desired complexity. Both the sampling probability and the architecture generator can be trained end-to-end in a gradient-based manner. With ShiftNAS, we can directly obtain the optimal model architecture and parameters for a given computational complexity. We evaluate our approach on multiple visual network models, including convolutional neural networks (CNNs) and vision transformers (ViTs), and demonstrate that ShiftNAS is model-agnostic. Experimental results on ImageNet show that ShiftNAS can improve the performance of one-shot NAS without additional consumption. Source codes are available at https://github.com/bestfleer/ShiftNAS.

updated: Mon Jul 17 2023 07:53:23 GMT+0000 (UTC)

published: Mon Jul 17 2023 07:53:23 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト