ZerO Initialization: Initializing Residual Networks with only Zeros and Ones

Jiawei Zhao; Florian Schäfer; Anima Anandkumar

ZerO初期化：ゼロと1のみで残余ネットワークを初期化する

ディープニューラルネットワークは通常、トレーニング中に安定した信号伝搬を保証するために適切に選択された初期分散を使用して、ランダムな重みで初期化されます。ただし、分散を選択する方法についてのコンセンサスはなく、これは特にレイヤーの数が増えるにつれて困難になります。この作業では、広く使用されているランダムな重みの初期化を、完全に決定論的な初期化スキームZerOに置き換えます。これは、残差ネットワークを0と1のみで初期化します。いくつかの追加のスキップ接続とアダマール変換を使用して標準のResNetアーキテクチャを拡張することにより、ZerOではゼロと1から完全にトレーニングを開始できます。これには、再現性の向上（さまざまな実験の実行での分散を減らすことによる）や、バッチ正規化なしでネットワークトレーニングを可能にするなどの多くの利点があります。驚いたことに、ZerOは、ImageNetを含むさまざまな画像分類データセットで最先端のパフォーマンスを実現していることがわかりました。これは、最新のネットワーク初期化にはランダムな重みが不要な場合があることを示しています。

Deep neural networks are usually initialized with random weights, with adequately selected initial variance to ensure stable signal propagation during training. However, there is no consensus on how to select the variance, and this becomes challenging especially as the number of layers grows. In this work, we replace the widely used random weight initialization with a fully deterministic initialization scheme ZerO, which initializes residual networks with only zeros and ones. By augmenting the standard ResNet architectures with a few extra skip connections and Hadamard transforms, ZerO allows us to start the training from zeros and ones entirely. This has many benefits such as improving reproducibility (by reducing the variance over different experimental runs) and allowing network training without batch normalization. Surprisingly, we find that ZerO achieves state-of-the-art performance over various image classification datasets, including ImageNet, which suggests random weights may be unnecessary for modern network initialization.

updated: Mon Oct 25 2021 06:17:33 GMT+0000 (UTC)

published: Mon Oct 25 2021 06:17:33 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト