Training BatchNorm Only in Neural Architecture Search and Beyond

Yichen Zhu; Jie Du; Yuqin Zhu; Yi Wang; Zhicai Ou; Feifei Feng; Jian Tang

ニューラルアーキテクチャ検索以降でのみBatchNormをトレーニングする

この作業では、ニューラルアーキテクチャ検索（NAS）でのバッチ正規化の使用法を調査します。具体的には、Frankle etal。 BatchNormのトレーニングは、重要なパフォーマンスしか達成できないことがわかります。さらに、Chen etal。 BatchNormのトレーニングは、ワンショットNASスーパーネットのトレーニングを10倍以上しかスピードアップできないと主張します。重要なのは、1）BatchNormのトレーニングでスーパーネットのトレーニング時間が短縮されたパフォーマンスウェルアーキテクチャしか見つけられない理由、および2）トレインBNのみのスーパーネットと標準トレインのスーパーネットの違いを理解するための努力がないことです。。まず、train-BNのみのネットワークがニューラルタンジェントカーネルレジームに収束し、すべてのパラメーターを理論的にトレーニングするのと同じトレーニングダイナミクスを取得することを示します。私たちの証明は、BatchNormをより少ないトレーニング時間でスーパーネット上でのみトレーニングするという主張を裏付けています。次に、train-BNのみのスーパーネットが他のオペレーターよりも畳み込みに利点を提供し、アーキテクチャ間の不公平な競争を引き起こすことを経験的に開示します。これは、BatchNormに接続されている畳み込み演算子のみが原因です。実験を通じて、このような不公平により、検索アルゴリズムが畳み込みのあるモデルを選択しやすくなることを示します。この問題を解決するために、すべてのオペレーターにBatchNormレイヤーを配置することにより、検索スペースに公平性を導入します。ただし、Chen etal。新しい検索スペースには適用されません。この目的のために、BatchNormの理論的特性から導き出された、表現度、トレーニング可能性、および不確実性の3つの観点からネットワークを評価するための新しい複合パフォーマンス指標を提案します。複数のNASベンチマーク（NAS-Bench101、NAS-Bench-201）および検索スペース（DARTS検索スペースおよびMobileNet検索スペース）でのアプローチの有効性を示します。

This work investigates the usage of batch normalization in neural architecture search (NAS). Specifically, Frankle et al. find that training BatchNorm only can achieve nontrivial performance. Furthermore, Chen et al. claim that training BatchNorm only can speed up the training of the one-shot NAS supernet over ten times. Critically, there is no effort to understand 1) why training BatchNorm only can find the perform-well architectures with the reduced supernet-training time, and 2) what is the difference between the train-BN-only supernet and the standard-train supernet. We begin by showing that the train-BN-only networks converge to the neural tangent kernel regime, obtain the same training dynamics as train all parameters theoretically. Our proof supports the claim to train BatchNorm only on supernet with less training time. Then, we empirically disclose that train-BN-only supernet provides an advantage on convolutions over other operators, cause unfair competition between architectures. This is due to only the convolution operator being attached with BatchNorm. Through experiments, we show that such unfairness makes the search algorithm prone to select models with convolutions. To solve this issue, we introduce fairness in the search space by placing a BatchNorm layer on every operator. However, we observe that the performance predictor in Chen et al. is inapplicable on the new search space. To this end, we propose a novel composite performance indicator to evaluate networks from three perspectives: expressivity, trainability, and uncertainty, derived from the theoretical property of BatchNorm. We demonstrate the effectiveness of our approach on multiple NAS-benchmarks (NAS-Bench101, NAS-Bench-201) and search spaces (DARTS search space and MobileNet search space).

updated: Wed Dec 01 2021 04:09:09 GMT+0000 (UTC)

published: Wed Dec 01 2021 04:09:09 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト