A Directed-Evolution Method for Sparsification and Compression of Neural Networks with Application to Object Identification and Segmentation and considerations of optimal quantization using small number of bits

Luiz M Franca-Neto

オブジェクトの識別とセグメンテーションへの応用を伴うニューラルネットワークのスパース化と圧縮のための定向進化法と少数のビットを使用した最適な量子化の考察

この作業では、ニューラルネットワークのスパース化のための指向進化（DE）法を紹介します。この方法では、ネットワークの精度に対するパラメーターの関連性が直接評価され、暫定的にゼロにされたときに精度への影響が最も少ないパラメーターが実際にゼロになります。 DE法は、自然界の進化を模倣することにより、大規模なネットワークでゼロ化される可能性のあるすべてのパラメーターの候補セットの潜在的な組み合わせ爆発を回避します。 DEは蒸留コンテキストを使用します[5]。このコンテキストでは、元のネットワークは教師であり、DEは、教師と学生の間の発散を最小限に抑えながら、学生のニューラルネットワークをスパース化の目標に進化させます。 DEによってネットワークの各層で目的のスパース化レベルに達した後、残りのパラメータでさまざまな量子化の選択肢を使用して、許容できる精度の低下を伴う表現の最小ビット数を見つけます。各スパース化された層における量子化レベルの最適な分布を見つけるための手順が提示されます。存続する量子化パラメータの適切な最終ロスレスエンコーディングが、最終パラメータ表現に使用されます。 DEは、MNIST、FashionMNIST、およびCOCOデータセットとプログレッシブ大規模ネットワークを使用した代表的なニューラルネットワークのサンプルで使用されました。 COCOデータセットでトレーニングされた6000万を超えるパラメーターネットワークを備えた80クラスのYOLOv3は、90％のスパース化に達し、4ビットパラメーター量子化を使用して80％を超える信頼度で元のネットワークによって識別されたすべてのオブジェクトを正しく識別およびセグメント化します。 40倍から80倍の圧縮。さまざまな方法の手法をネストできることは、作者から逃れていません。 DEのサイクルでスパース化に最適なパラメーターセットが特定されると、パラメーターの大きさやヘッセ近似などの基準の組み合わせを使用して、これらのパラメーターのサブセットのみをゼロ化する決定を下すことができます。

This work introduces Directed-Evolution (DE) method for sparsification of neural networks, where the relevance of parameters to the network accuracy is directly assessed and the parameters that produce the least effect on accuracy when tentatively zeroed are indeed zeroed. DE method avoids a potentially combinatorial explosion of all possible candidate sets of parameters to be zeroed in large networks by mimicking evolution in the natural world. DE uses a distillation context [5]. In this context, the original network is the teacher and DE evolves the student neural network to the sparsification goal while maintaining minimal divergence between teacher and student. After the desired sparsification level is reached in each layer of the network by DE, a variety of quantization alternatives are used on the surviving parameters to find the lowest number of bits for their representation with acceptable loss of accuracy. A procedure to find optimal distribution of quantization levels in each sparsified layer is presented. Suitable final lossless encoding of the surviving quantized parameters is used for the final parameter representation. DE was used in sample of representative neural networks using MNIST, FashionMNIST and COCO data sets with progressive larger networks. An 80 classes YOLOv3 with more than 60 million parameters network trained on COCO dataset reached 90% sparsification and correctly identifies and segments all objects identified by the original network with more than 80% confidence using 4bit parameter quantization. Compression between 40x and 80x. It has not escaped the authors that techniques from different methods can be nested. Once the best parameter set for sparsification is identified in a cycle of DE, a decision on zeroing only a sub-set of those parameters can be made using a combination of criteria like parameter magnitude and Hessian approximations.

updated: Sun Jun 12 2022 23:49:08 GMT+0000 (UTC)

published: Sun Jun 12 2022 23:49:08 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト