Sparse Training via Boosting Pruning Plasticity with Neuroregeneration

Shiwei Liu; Tianlong Chen; Xiaohan Chen; Zahra Atashgahi; Lu Yin; Huanyu Kou; Li Shen; Mykola Pechenizkiy; Zhangyang Wang; Decebal Constantin Mocanu

神経再生による剪定可塑性の強化によるスパーストレーニング

宝くじの仮説（LTH）とシングルショットネットワークプルーニング（SNIP）に関する作業は、現在、トレーニング後のプルーニング（反復マグニチュードプルーニング）とトレーニング前のプルーニング（初期化時のプルーニング）で多くの注目を集めています。前者の方法は非常に大きな計算コストに悩まされ、後者のカテゴリーの方法は通常不十分なパフォーマンスで苦労します。比較すると、トレーニング中の剪定では、トレーニング/推論の効率と同等のパフォーマンスを一時的に同時に享受する剪定方法のクラスは、あまり検討されていません。トレーニング中の剪定をよりよく理解するために、剪定の可塑性（剪定されたネットワークが元のパフォーマンスを回復する能力）の観点から、トレーニング全体の剪定の効果を定量的に調査します。剪定の可塑性は、文献におけるニューラルネットワークの剪定に関する他のいくつかの経験的観察を説明するのに役立ちます。さらに、剪定の可塑性は、神経再生と呼ばれる脳に触発されたメカニズムを注入することによって、つまり剪定と同じ数の接続を再生することによって大幅に改善できることを発見しました。剪定の可塑性からの洞察に基づいて、ゼロコスト神経再生（GraNet）とその動的スパーストレーニング（DST）バリアント（GraNet-ST）を使用した段階的剪定と呼ばれる新しい段階的大きさの剪定（GMP）メソッドを設計します。それらの両方が最先端技術を進歩させます。おそらく最も印象的なのは、後者が初めて、ImageNet上のResNet-50を使用して、さまざまな密からスパースの方法よりもスパースからスパースへのトレーニングパフォーマンスを大幅に向上させることです。すべてのコードをリリースします。

Works on lottery ticket hypothesis (LTH) and single-shot network pruning (SNIP) have raised a lot of attention currently on post-training pruning (iterative magnitude pruning), and before-training pruning (pruning at initialization). The former method suffers from an extremely large computation cost and the latter category of methods usually struggles with insufficient performance. In comparison, during-training pruning, a class of pruning methods that simultaneously enjoys the training/inference efficiency and the comparable performance, temporarily, has been less explored. To better understand during-training pruning, we quantitatively study the effect of pruning throughout training from the perspective of pruning plasticity (the ability of the pruned networks to recover the original performance). Pruning plasticity can help explain several other empirical observations about neural network pruning in literature. We further find that pruning plasticity can be substantially improved by injecting a brain-inspired mechanism called neuroregeneration, i.e., to regenerate the same number of connections as pruned. Based on the insights from pruning plasticity, we design a novel gradual magnitude pruning (GMP) method, named gradual pruning with zero-cost neuroregeneration (GraNet), and its dynamic sparse training (DST) variant (GraNet-ST). Both of them advance state of the art. Perhaps most impressively, the latter for the first time boosts the sparse-to-sparse training performance over various dense-to-sparse methods by a large margin with ResNet-50 on ImageNet. We will release all codes.

updated: Sat Jun 19 2021 02:09:25 GMT+0000 (UTC)

published: Sat Jun 19 2021 02:09:25 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト