Deeper Learning with CoLU Activation

Advait Vagerwal

CoLUアクティベーションによるより深い学習

ニューラルネットワークでは、非線形性は活性化関数によって導入されます。一般的に使用される活性化関数の1つは、Rectified Linear Unit（ReLU）です。 ReLUはアクティベーションとして人気がありますが、欠点があります。 SwishやMishのような最先端の関数は、他の活性化関数によって提示される多くの欠陥と戦うため、より良い選択として注目を集めています。 CoLUは、プロパティがSwishやMishに似た活性化関数です。これは、f（x）= x /（1-xe ^-（x + e ^ x））として定義されます。それは滑らかで、継続的に微分可能で、上に制限がなく、下に制限があり、飽和せず、単調ではありません。さまざまな活性化関数を使用してCoLUで行われた実験に基づいて、CoLUは通常、より深いニューラルネットワークで他の関数よりも優れたパフォーマンスを発揮することが観察されています。畳み込み層の数を増やしながらMNISTでさまざまなニューラルネットワークをトレーニングしている間、CoLUはより多くの層に対して最高の精度を維持しました。 8つの畳み込み層を持つ小規模なネットワークでは、CoLUの平均精度が最も高く、ReLUがそれに続きます。 Fashion-MNISTでトレーニングされたVGG-13では、CoLUの精度はMishより4.20％高く、ReLUより3.31％高くなりました。 Cifar-10でトレーニングされたResNet-9では、CoLUの精度はSwishより0.05％高く、Mishより0.09％高く、ReLUより0.29％高くなりました。層の数、層の種類、パラメーターの数、学習率、オプティマイザーなどのさまざまな要因に基づいて、活性化関数が他の活性化関数よりも適切に動作する可能性があることが観察されます。より最適な活性化関数とそれらの動作に関するより多くの知識。

In neural networks, non-linearity is introduced by activation functions. One commonly used activation function is Rectified Linear Unit (ReLU). ReLU has been a popular choice as an activation but has flaws. State-of-the-art functions like Swish and Mish are now gaining attention as a better choice as they combat many flaws presented by other activation functions. CoLU is an activation function similar to Swish and Mish in properties. It is defined as f(x)=x/(1-xe^-(x+e^x)). It is smooth, continuously differentiable, unbounded above, bounded below, non-saturating, and non-monotonic. Based on experiments done with CoLU with different activation functions, it is observed that CoLU usually performs better than other functions on deeper neural networks. While training different neural networks on MNIST on an incrementally increasing number of convolutional layers, CoLU retained the highest accuracy for more layers. On a smaller network with 8 convolutional layers, CoLU had the highest mean accuracy, closely followed by ReLU. On VGG-13 trained on Fashion-MNIST, CoLU had a 4.20% higher accuracy than Mish and 3.31% higher accuracy than ReLU. On ResNet-9 trained on Cifar-10, CoLU had 0.05% higher accuracy than Swish, 0.09% higher accuracy than Mish, and 0.29% higher accuracy than ReLU. It is observed that activation functions may behave better than other activation functions based on different factors including the number of layers, types of layers, number of parameters, learning rate, optimizer, etc. Further research can be done on these factors and activation functions for more optimal activation functions and more knowledge on their behavior.

updated: Sat Dec 18 2021 21:11:11 GMT+0000 (UTC)

published: Sat Dec 18 2021 21:11:11 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト