When to Prune? A Policy towards Early Structural Pruning

Maying Shen; Pavlo Molchanov; Hongxu Yin; Jose M. Alvarez

いつ剪定しますか？初期の構造剪定に向けた方針

プルーニングにより、ネットワークメモリフットプリントと時間の複雑さを魅力的に削減できます。従来のトレーニング後の剪定手法は、トレーニングのための重い計算を見落としながら、効率的な推論に傾いています。初期化時のトレーニング前の剪定に関する最近の調査は、剪定によるトレーニングコストの削減を示唆していますが、パフォーマンスが著しく低下しています。私たちは、両方向の利点を組み合わせて、パフォーマンスを損なうことなくトレーニング中にできるだけ早く剪定するポリシーを提案します。初期化時に剪定する代わりに、私たちの方法は、ニューロンの重要度ランキングを介して支配的なサブネットワークを絶えず評価しながら、アーキテクチャを迅速に導くために、いくつかのエポックの初期の密なトレーニングを利用します。これにより、構造が安定する主要なサブネットワークが明らかになり、従来の剪定を早期にトレーニングに組み込むことができます。これを早期に行うために、サブネットワークのアーキテクチャの類似性に依存し、サブネットワークのアーキテクチャが安定したときに迅速にプルーニングをトリガーするアーリープルーニングインジケーター（EPI）をさらに導入します。 ImageNetでの広範な実験を通じて、EPIは、剪定に適した初期のトレーニングエポックの迅速な追跡を可能にし、エポックをスキャンして桁違いに多くの計算を必要とする「オラクル」グリッド検索と同じ効果を提供することを示します。私たちの方法は、最先端の剪定に比べて1.4％のトップ1精度の向上をもたらし、GPUのトレーニングコストを2.4倍削減します。したがって、トレーニング中のネットワーク剪定に新しい効率と精度の境界を提供します。

Pruning enables appealing reductions in network memory footprint and time complexity. Conventional post-training pruning techniques lean towards efficient inference while overlooking the heavy computation for training. Recent exploration of pre-training pruning at initialization hints on training cost reduction via pruning, but suffers noticeable performance degradation. We attempt to combine the benefits of both directions and propose a policy that prunes as early as possible during training without hurting performance. Instead of pruning at initialization, our method exploits initial dense training for few epochs to quickly guide the architecture, while constantly evaluating dominant sub-networks via neuron importance ranking. This unveils dominant sub-networks whose structures turn stable, allowing conventional pruning to be pushed earlier into the training. To do this early, we further introduce an Early Pruning Indicator (EPI) that relies on sub-network architectural similarity and quickly triggers pruning when the sub-network's architecture stabilizes. Through extensive experiments on ImageNet, we show that EPI empowers a quick tracking of early training epochs suitable for pruning, offering same efficacy as an otherwise ``oracle'' grid-search that scans through epochs and requires orders of magnitude more compute. Our method yields 1.4% top-1 accuracy boost over state-of-the-art pruning counterparts, cuts down training cost on GPU by 2.4×, hence offers a new efficiency-accuracy boundary for network pruning during training.

updated: Fri Oct 22 2021 18:39:22 GMT+0000 (UTC)

published: Fri Oct 22 2021 18:39:22 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト