Gradient-Coherent Strong Regularization for Deep Neural Networks

Dae Hoon Park; Chiu Man Ho; Yi Chang; Huaqing Zhang

ディープニューラルネットワークの勾配コヒーレントな強い正則化

正則化は、ディープニューラルネットワークの一般化で重要な役割を果たします。これは、多くのパラメーターで過剰適合しがちです。 L1およびL2レギュラーは、シンプルで効果的な機械学習の一般的な正則化ツールです。ただし、深いニューラルネットワークで確率的勾配降下を使用して強いL1またはL2正則化を課すことは簡単に失敗し、基礎となるニューラルネットワークの一般化能力が制限されることがわかります。この現象を理解するために、ディープニューラルネットワークに強い正則化が課せられたときに学習が失敗する方法と理由を最初に調査します。次に、強い正則化の存在下で勾配がコヒーレントに保たれている場合にのみ正則化を課す、勾配コヒーレントな強い正則化という新しい方法を提案します。実験は、画像認識のための3つのベンチマークデータセットで複数のディープアーキテクチャを使用して実行されます。実験結果は、提案されたアプローチが実際に強力な正則化に耐え、精度と圧縮（最大9.9x）の両方を大幅に改善することを示しています。

Regularization plays an important role in generalization of deep neural networks, which are often prone to overfitting with their numerous parameters. L1 and L2 regularizers are common regularization tools in machine learning with their simplicity and effectiveness. However, we observe that imposing strong L1 or L2 regularization with stochastic gradient descent on deep neural networks easily fails, which limits the generalization ability of the underlying neural networks. To understand this phenomenon, we first investigate how and why learning fails when strong regularization is imposed on deep neural networks. We then propose a novel method, gradient-coherent strong regularization, which imposes regularization only when the gradients are kept coherent in the presence of strong regularization. Experiments are performed with multiple deep architectures on three benchmark data sets for image recognition. Experimental results show that our proposed approach indeed endures strong regularization and significantly improves both accuracy and compression (up to 9.9x), which could not be achieved otherwise.

updated: Fri Oct 18 2019 01:52:12 GMT+0000 (UTC)

published: Tue Nov 20 2018 03:41:56 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト