The Nonlinearity Coefficient -- A Practical Guide to Neural Architecture Design

George Philipp

非線形係数-ニューラルアーキテクチャ設計の実用ガイド

本質的に、ニューラルネットワークは任意の微分可能でパラメーター化された関数です。任意のタスクにニューラルネットワークアーキテクチャを選択することは、それらの関数の空間を検索するのと同じくらい複雑です。過去数年間、「ニューラルアーキテクチャ設計」は「ニューラルアーキテクチャ検索」（NAS）、つまりブルートフォースの大規模検索とほぼ同義でした。 NASは、実用的なタスクで大幅な向上をもたらしました。ただし、NASの手法では、CNNまたはLSTMに基づいて、数十年前に遡ることが多いアーキテクチャ周辺の小さな近隣のアーキテクチャ空間で局所最適点を検索することになります。この作業では、「ゼロショットアーキテクチャ設計」（ZSAD）と呼ばれるアーキテクチャ設計への異なる補完的なアプローチを紹介します。トレーニングなしで、アーキテクチャがトレーニング後のタスクで比較的高いテストまたはトレーニングエラーを達成するかどうかを予測できる方法を開発します。次に、アーキテクチャ定義自体の観点からエラーを説明し、この説明に基づいてアーキテクチャを変更するためのツールを開発します。これにより、ディープラーニングの実践者に前例のないレベルの制御が与えられます。先行技術が存在しないタスクであっても、コードの最初の行を記述する前に、情報に基づいた設計上の決定を下すことができます。私たちの最初の主要な貢献は、ニューラルアーキテクチャの「非線形性の程度」がそのパフォーマンスの背後にある主要な因果要因であり、アーキテクチャのモデルの複雑さの主要な側面であることを示すことです。非線形性を測定するためのスカラーメトリックである「非線形係数」（NLC）を紹介します。広範な経験的研究を通じて、トレーニング前のアーキテクチャのランダムに初期化された状態でのNLCの値は、トレーニング後のテストエラーの強力な予測因子であり、適切なサイズのNLCを達成することが最適なテストエラーを達成するために不可欠であることを示します。 NLCは、概念的に単純で、フィードフォワードネットワークに対して明確に定義されており、計算が簡単で安価であり、理論的、経験的、概念的な根拠が豊富で、アーキテクチャの定義に基づいており、「非線形正規化」アルゴリズムを介して簡単に制御できます。 NLCは、特にアーキテクチャ設計および一般的なニューラルネットワーク分析にとって最も強力なスカラー統計であると主張します。私たちの分析は、層の「メタ分布」を明らかにするために使用する平均場理論によって促進されています。 NLCを超えて、テストとトレーニングのエラーに重要な説明的影響を与える一連のメトリックとプロパティを明らかにして具体化します。さらに、これらのメトリックとプロパティを使用して、ランダムに生成されたさまざまなアーキテクチャにわたるエラー変動の大部分について説明します。洞察をアーキテクチャ設計者向けの実用的なガイドにまとめます。これにより、ディープラーニングの展開の試行錯誤のフェーズを大幅に短縮できると主張しています。私たちの結果は、注意と厳密さの点で他の大多数の深層学習研究のそれを超える実験プロトコルに基づいています。たとえば、データセット、学習率、浮動小数点精度、損失関数、統計的推定誤差、バッチの相互依存性がパフォーマンスやその他の主要なプロパティに与える影響を調査します。私たちは、建築設計研究の進歩を大幅に加速できると信じる研究慣行を推進しています。

In essence, a neural network is an arbitrary differentiable, parametrized function. Choosing a neural network architecture for any task is as complex as searching the space of those functions. For the last few years, 'neural architecture design' has been largely synonymous with 'neural architecture search' (NAS), i.e. brute-force, large-scale search. NAS has yielded significant gains on practical tasks. However, NAS methods end up searching for a local optimum in architecture space in a small neighborhood around architectures that often go back decades, based on CNN or LSTM. In this work, we present a different and complementary approach to architecture design, which we term 'zero-shot architecture design' (ZSAD). We develop methods that can predict, without any training, whether an architecture will achieve a relatively high test or training error on a task after training. We then go on to explain the error in terms of the architecture definition itself and develop tools for modifying the architecture based on this explanation. This confers an unprecedented level of control on the deep learning practitioner. They can make informed design decisions before the first line of code is written, even for tasks for which no prior art exists. Our first major contribution is to show that the 'degree of nonlinearity' of a neural architecture is a key causal driver behind its performance, and a primary aspect of the architecture's model complexity. We introduce the 'nonlinearity coefficient' (NLC), a scalar metric for measuring nonlinearity. Via extensive empirical study, we show that the value of the NLC in the architecture's randomly initialized state before training is a powerful predictor of test error after training and that attaining a right-sized NLC is essential for attaining an optimal test error. The NLC is also conceptually simple, well-defined for any feedforward network, easy and cheap to compute, has extensive theoretical, empirical and conceptual grounding, follows instructively from the architecture definition, and can be easily controlled via our 'nonlinearity normalization' algorithm. We argue that the NLC is the most powerful scalar statistic for architecture design specifically and neural network analysis in general. Our analysis is fueled by mean field theory, which we use to uncover the 'meta-distribution' of layers. Beyond the NLC, we uncover and flesh out a range of metrics and properties that have a significant explanatory influence on test and training error. We go on to explain the majority of the error variation across a wide range of randomly generated architectures with these metrics and properties. We compile our insights into a practical guide for architecture designers, which we argue can significantly shorten the trial-and-error phase of deep learning deployment. Our results are grounded in an experimental protocol that exceeds that of the vast majority of other deep learning studies in terms of carefulness and rigor. We study the impact of e.g. dataset, learning rate, floating-point precision, loss function, statistical estimation error and batch inter-dependency on performance and other key properties. We promote research practices that we believe can significantly accelerate progress in architecture design research.

updated: Tue May 25 2021 20:47:43 GMT+0000 (UTC)

published: Tue May 25 2021 20:47:43 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト