Accelerating CNN Training by Pruning Activation Gradients

Xucheng Ye; Pengcheng Dai; Junyu Luo; Xin Guo; Yingjie Qi; Jianlei Yang; Yiran Chen

アクティベーショングラデーションのプルーニングによるCNNトレーニングの加速

スパース化はCNN推論を加速する効率的なアプローチですが、関連する勾配が動的に変更されるため、トレーニング手順でスパース性を利用することは困難です。実際、重要な観察結果は、逆伝播の活性化勾配のほとんどがゼロに非常に近く、体重の更新にわずかな影響しか及ぼさないことを示しています。したがって、これらの非常に小さい勾配をランダムに枝刈りして、活性化勾配の統計的分布に従ってCNNトレーニングを加速することを検討します。一方、収束に対する剪定アルゴリズムの影響を理論的に分析します。提案されたアプローチは、CIFAR- ｛10、100｝およびImageNetデータセットを使用して、AlexNetおよびResNet- ｛18、34、50、101、152｝で評価されます。実験結果は、私たちのトレーニングアプローチが、バックプロパゲーションステージで最大5.92×スピードアップを実質的に達成し、無視できるほどの精度損失を示していることを示しています。

Sparsification is an efficient approach to accelerate CNN inference, but it is challenging to take advantage of sparsity in training procedure because the involved gradients are dynamically changed. Actually, an important observation shows that most of the activation gradients in back-propagation are very close to zero and only have a tiny impact on weight-updating. Hence, we consider pruning these very small gradients randomly to accelerate CNN training according to the statistical distribution of activation gradients. Meanwhile, we theoretically analyze the impact of pruning algorithm on the convergence. The proposed approach is evaluated on AlexNet and ResNet-｛18, 34, 50, 101, 152｝ with CIFAR-｛10, 100｝ and ImageNet datasets. Experimental results show that our training approach could substantially achieve up to 5.92 × speedups at back-propagation stage with negligible accuracy loss.

updated: Mon Jul 20 2020 11:13:05 GMT+0000 (UTC)

published: Thu Aug 01 2019 01:48:11 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト