Learning Activation Functions: A new paradigm for understanding Neural Networks

Mohit Goyal; Rajan Goyal; Brejesh Lall

活性化関数の学習：ニューラルネットワークを理解するための新しいパラダイム

活性化関数の領域での研究の範囲は限られたままであり、ニューラルネットワーク（NN）の最適化の容易さまたは一般化の質を改善することに集中しています。ただし、ディープラーニングをより深く理解するには、NNの非線形コンポーネントをより注意深く調べることが重要になります。この論文では、将来のNNの動作についての洞察を可能にするために、適切な数学的根拠とともに、一般的な形式の活性化関数を提供することを目指しています。トレーニング中に学習し、既存の活性化関数のほとんどを近似できる「自己学習可能な活性化関数」（SLAF）を提案します。 SLAFは、最適な活性化関数の適切な近似に役立つ、事前定義された基本要素の加重和として与えられます。これらの基本要素の係数により、連続関数の空間全体を検索できます（従来のすべてのアクティブ化で構成されます）。 SLAFを備えたニューラルネットワーク（SLNN）でパフォーマンスを達成するために使用できるさまざまなトレーニングルーチンを提案します。 SLNNは、リプシッツ連続活性化を使用してニューラルネットワークを近似し、その容量と標準NNとの同等性を強調する任意のエラーに近似できることを証明します。また、SLNNは、幅や深さなどのいくつかのハイパーパラメータを不要にする最後の層までの有限次数多項式のコレクションとして完全に表すことができます。 SLNNの最適化は依然として課題であるため、SLAFを標準のアクティベーション（ReLUなど）と一緒に使用すると、パラメーターの数をわずかに増やすだけでパフォーマンスを向上できることを示します。

The scope of research in the domain of activation functions remains limited and centered around improving the ease of optimization or generalization quality of neural networks (NNs). However, to develop a deeper understanding of deep learning, it becomes important to look at the non linear component of NNs more carefully. In this paper, we aim to provide a generic form of activation function along with appropriate mathematical grounding so as to allow for insights into the working of NNs in future. We propose "Self-Learnable Activation Functions" (SLAF), which are learned during training and are capable of approximating most of the existing activation functions. SLAF is given as a weighted sum of pre-defined basis elements which can serve for a good approximation of the optimal activation function. The coefficients for these basis elements allow a search in the entire space of continuous functions (consisting of all the conventional activations). We propose various training routines which can be used to achieve performance with SLAF equipped neural networks (SLNNs). We prove that SLNNs can approximate any neural network with lipschitz continuous activations, to any arbitrary error highlighting their capacity and possible equivalence with standard NNs. Also, SLNNs can be completely represented as a collections of finite degree polynomial upto the very last layer obviating several hyper parameters like width and depth. Since the optimization of SLNNs is still a challenge, we show that using SLAF along with standard activations (like ReLU) can provide performance improvements with only a small increase in number of parameters.

updated: Wed Dec 09 2020 04:13:25 GMT+0000 (UTC)

published: Sun Jun 23 2019 01:54:36 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト