FreezeNet: Full Performance by Reduced Storage Costs

Paul Wimmer; Jens Mehnert; Alexandru Condurache

FreezeNet：ストレージコストの削減によるフルパフォーマンス

プルーニングは、パラメーターをゼロに設定することにより、スパースネットワークを生成します。この作業では、スパース勾配計算を維持しながら、追加のストレージコストを追加することなく、トレーニングの前に適用されるワンショット剪定方法を改善します。プルーニングとの主な違いは、ネットワークの重みをスパース化せず、いくつかの重要なパラメーターのみを学習し、他のパラメーターをランダムに初期化された値に固定することです。このメカニズムは、パラメーターのフリーズと呼ばれます。これらの凍結された重みは、単一の32ビットランダムシード番号で効率的に保存できます。凍結されるパラメータは、トレーニング開始前に適用される1回の前後パスによってワンショットで決定されます。導入されたメソッドをFreezeNetと呼びます。私たちの実験では、FreezeNetsが特に極端な凍結率で良好な結果を達成することを示しています。ウェイトをフリーズすると、ネットワーク全体の勾配フローが維持されるため、FreezeNetは、プルーニングされたウェイトと比較して、トレーニングが向上し、容量が増加します。分類タスクMNISTおよびCIFAR-10 / 100では、SNIPを上回ります。この設定では、トレーニング前に適用された、最もよく報告されているワンショットプルーニング方法です。 MNISTでは、FreezeNetはベースラインのLeNet-5-Caffeアーキテクチャの99.2％のパフォーマンスを達成し、トレーニングおよび保存されたパラメーターの数を157倍に圧縮します。

Pruning generates sparse networks by setting parameters to zero. In this work we improve one-shot pruning methods, applied before training, without adding any additional storage costs while preserving the sparse gradient computations. The main difference to pruning is that we do not sparsify the network's weights but learn just a few key parameters and keep the other ones fixed at their random initialized value. This mechanism is called freezing the parameters. Those frozen weights can be stored efficiently with a single 32bit random seed number. The parameters to be frozen are determined one-shot by a single for- and backward pass applied before training starts. We call the introduced method FreezeNet. In our experiments we show that FreezeNets achieve good results, especially for extreme freezing rates. Freezing weights preserves the gradient flow throughout the network and consequently, FreezeNets train better and have an increased capacity compared to their pruned counterparts. On the classification tasks MNIST and CIFAR-10/100 we outperform SNIP, in this setting the best reported one-shot pruning method, applied before training. On MNIST, FreezeNet achieves 99.2% performance of the baseline LeNet-5-Caffe architecture, while compressing the number of trained and stored parameters by a factor of x 157.

updated: Sat Nov 28 2020 08:32:44 GMT+0000 (UTC)

published: Sat Nov 28 2020 08:32:44 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト