Activated Gradients for Deep Neural Networks

Mei Liu; Liangming Chen; Xiaohao Du; Long Jin; Mingsheng Shang

ディープニューラルネットワークのアクティブ化された勾配

ディープニューラルネットワークは、悪条件の問題、勾配消失/爆発の問題、および鞍点の問題が原因で、パフォーマンスの低下やトレーニングの失敗にさえ悩まされることがよくあります。この論文では、これらの課題を処理するために、勾配に勾配活性化関数（GAF）を作用させることによる新しい方法を提案します。直感的に、GAFは小さなグラデーションを拡大し、大きなグラデーションを制限します。理論的には、このペーパーはGAFが満たす必要のある条件を示し、これに基づいて、GAFが上記の問題を軽減することを証明します。さらに、この論文は、いくつかの仮定の下で、GAFを使用したSGDの収束率がGAFを使用しない場合よりも速いことを証明しています。さらに、CIFAR、ImageNet、およびPASCALビジュアルオブジェクトクラスでの実験により、GAFの有効性が確認されています。実験結果はまた、提案された方法がそれらの性能を改善するために様々な深層ニューラルネットワークに採用され得ることを示している。ソースコードはhttps://github.com/LongJin-lab/Activated-Gradients-for-Deep-Neural-Networksで公開されています。

Deep neural networks often suffer from poor performance or even training failure due to the ill-conditioned problem, the vanishing/exploding gradient problem, and the saddle point problem. In this paper, a novel method by acting the gradient activation function (GAF) on the gradient is proposed to handle these challenges. Intuitively, the GAF enlarges the tiny gradients and restricts the large gradient. Theoretically, this paper gives conditions that the GAF needs to meet, and on this basis, proves that the GAF alleviates the problems mentioned above. In addition, this paper proves that the convergence rate of SGD with the GAF is faster than that without the GAF under some assumptions. Furthermore, experiments on CIFAR, ImageNet, and PASCAL visual object classes confirm the GAF's effectiveness. The experimental results also demonstrate that the proposed method is able to be adopted in various deep neural networks to improve their performance. The source code is publicly available at https://github.com/LongJin-lab/Activated-Gradients-for-Deep-Neural-Networks.

updated: Fri Jul 09 2021 06:00:55 GMT+0000 (UTC)

published: Fri Jul 09 2021 06:00:55 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト