Rethinking Architecture Selection in Differentiable NAS

Ruochen Wang; Minhao Cheng; Xiangning Chen; Xiaocheng Tang; Cho-Jui Hsieh

差別化可能なNASでのアーキテクチャ選択の再考

微分可能なニューラルアーキテクチャ検索は、検索の効率と単純さで最も人気のあるニューラルアーキテクチャ検索（NAS）メソッドの1つであり、勾配ベースのアルゴリズムを介して重み共有スーパーネットでモデルの重みとアーキテクチャパラメータを共同で最適化します。検索フェーズの最後に、アーキテクチャパラメータの値が操作の強度を反映しているという暗黙の前提の下で、アーキテクチャパラメータが最大の操作が選択されて最終的なアーキテクチャが形成されます。スーパーネットの最適化については多くの議論がなされてきましたが、アーキテクチャの選択プロセスはほとんど注目されていません。アーキテクチャパラメータの大きさが、操作がスーパーネットのパフォーマンスにどの程度貢献しているかを必ずしも示していないことを示すために、経験的および理論的な分析を提供します。スーパーネットに対する各操作の影響を直接測定する、代替の摂動ベースのアーキテクチャ選択を提案します。提案されたアーキテクチャの選択を使用して、いくつかの差別化可能なNASメソッドを再評価し、基盤となるスーパーネットから大幅に改善されたアーキテクチャを一貫して抽出できることを確認しました。さらに、DARTSのいくつかの障害モードは、提案された選択方法で大幅に軽減できることがわかりました。これは、DARTSで観察された不十分な一般化の多くが、スーパーネットの完全な最適化ではなく、マグニチュードベースのアーキテクチャ選択の失敗に起因する可能性があることを示しています。。

Differentiable Neural Architecture Search is one of the most popular Neural Architecture Search (NAS) methods for its search efficiency and simplicity, accomplished by jointly optimizing the model weight and architecture parameters in a weight-sharing supernet via gradient-based algorithms. At the end of the search phase, the operations with the largest architecture parameters will be selected to form the final architecture, with the implicit assumption that the values of architecture parameters reflect the operation strength. While much has been discussed about the supernet's optimization, the architecture selection process has received little attention. We provide empirical and theoretical analysis to show that the magnitude of architecture parameters does not necessarily indicate how much the operation contributes to the supernet's performance. We propose an alternative perturbation-based architecture selection that directly measures each operation's influence on the supernet. We re-evaluate several differentiable NAS methods with the proposed architecture selection and find that it is able to extract significantly improved architectures from the underlying supernets consistently. Furthermore, we find that several failure modes of DARTS can be greatly alleviated with the proposed selection method, indicating that much of the poor generalization observed in DARTS can be attributed to the failure of magnitude-based architecture selection rather than entirely the optimization of its supernet.

updated: Tue Aug 10 2021 00:53:39 GMT+0000 (UTC)

published: Tue Aug 10 2021 00:53:39 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト