Self-Normalizing Neural Networks

Günter Klambauer; Thomas Unterthiner; Andreas Mayr; Sepp Hochreiter

自己正規化ニューラルネットワーク

ディープラーニングは、畳み込みニューラルネットワーク（CNN）とリカレントニューラルネットワーク（RNN）による自然言語処理により、ビジョンに革命をもたらしました。ただし、標準フィードフォワードニューラルネットワーク（FNN）を使用したディープラーニングの成功事例はまれです。通常、パフォーマンスの良いFNNは浅いため、多くのレベルの抽象表現を活用できません。高レベルの抽象表現を可能にする自己正規化ニューラルネットワーク（SNN）を導入します。バッチの正規化には明示的な正規化が必要ですが、SNNのニューロンの活性化はゼロ平均と単位分散に向かって自動的に収束します。 SNNの活性化関数は「スケーリングされた指数線形単位」（SELU）であり、自己正規化特性を誘導します。バナッハの固定小数点定理を使用して、多くのネットワーク層を伝播するゼロ平均と単位分散に近い活性化は、ノイズと摂動が存在する場合でも、ゼロ平均と単位分散に収束することを証明します。このSNNの収束特性により、（1）多くの層を備えた深いネットワークをトレーニングし、（2）強力な正則化を採用し、（3）学習を非常に堅牢にすることができます。さらに、単位分散に近くない活性化の場合、分散の上限と下限を証明します。したがって、勾配の消失と爆発は不可能です。 SNNを（a）UCI機械学習リポジトリの121タスク、（b）創薬ベンチマーク、（c）天文学のタスクと標準FNNおよびランダムフォレストやサポートベクターマシンなどの他の機械学習方法と比較しました。 SNNは、121個のUCIタスクで競合するすべてのFNNメソッドを大幅に上回り、Tox21データセットですべての競合するメソッドを上回り、天文学データセットで新しいレコードを設定しました。多くの場合、受賞したSNNアーキテクチャは非常に深いです。実装は以下から入手可能です: github.com/bioinf-jku/SNNs

Deep Learning has revolutionized vision via convolutional neural networks (CNNs) and natural language processing via recurrent neural networks (RNNs). However, success stories of Deep Learning with standard feed-forward neural networks (FNNs) are rare. FNNs that perform well are typically shallow and, therefore cannot exploit many levels of abstract representations. We introduce self-normalizing neural networks (SNNs) to enable high-level abstract representations. While batch normalization requires explicit normalization, neuron activations of SNNs automatically converge towards zero mean and unit variance. The activation function of SNNs are "scaled exponential linear units" (SELUs), which induce self-normalizing properties. Using the Banach fixed-point theorem, we prove that activations close to zero mean and unit variance that are propagated through many network layers will converge towards zero mean and unit variance -- even under the presence of noise and perturbations. This convergence property of SNNs allows to (1) train deep networks with many layers, (2) employ strong regularization, and (3) to make learning highly robust. Furthermore, for activations not close to unit variance, we prove an upper and lower bound on the variance, thus, vanishing and exploding gradients are impossible. We compared SNNs on (a) 121 tasks from the UCI machine learning repository, on (b) drug discovery benchmarks, and on (c) astronomy tasks with standard FNNs and other machine learning methods such as random forests and support vector machines. SNNs significantly outperformed all competing FNN methods at 121 UCI tasks, outperformed all competing methods at the Tox21 dataset, and set a new record at an astronomy data set. The winning SNN architectures are often very deep. Implementations are available at: github.com/bioinf-jku/SNNs.

updated: Thu Sep 07 2017 10:39:00 GMT+0000 (UTC)

published: Thu Jun 08 2017 11:14:24 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト