Multi-Prize Lottery Ticket Hypothesis: Finding Accurate Binary Neural Networks by Pruning A Randomly Weighted Network

James Diffenderfer; Bhavya Kailkhura

マルチプライズ宝くじの仮説：ランダムに重み付けされたネットワークを剪定することにより、正確なバイナリニューラルネットワークを見つける

最近、Frankle＆Carbin（2019）は、ランダムに初期化された高密度ネットワークにサブネットワークが含まれていることを示しました。サブネットワークは、一度見つかったら、トレーニングされた高密度ネットワークに匹敵するテスト精度に到達するようにトレーニングできます。ただし、これらの高性能のトレーニング可能なサブネットワークを見つけるにはコストがかかり、ウェイトのトレーニングとプルーニングの反復プロセスが必要になります。この論文では、より強力なマルチ賞宝くじチケット仮説を提案（および証明）します。ランダムな重みを持つ十分にパラメーター化されたニューラルネットワークには、（a）学習した高密度ターゲットネットワークと同等の精度を持ついくつかのサブシステム（当選チケット）が含まれます重み（賞1）、（b）は、賞1（賞2）を達成するために追加のトレーニングを必要とせず、（c）極端な形式の量子化（つまり、バイナリの重みおよび/またはアクティブ化）（賞3）に対して堅牢です。これは、ランダムに重み付けされた完全精度のニューラルネットワークを剪定して量子化するだけで、コンパクトでありながら高精度のバイナリニューラルネットワークを学習するための新しいパラダイムを提供します。また、マルチプライズチケット（MPT）を検索し、CIFAR-10およびImageNetデータセットで一連の実験を実行してテストするためのアルゴリズムを提案します。経験的結果は、モデルがより深く、より広くなるにつれて、マルチプライズチケットは、ウェイトトレーニングされた非常に大きくて完全な精度のチケットと比較して、同様の（場合によってはさらに高い）テスト精度に到達し始めることを示しています。重み値を更新することなく、MPT-1 / 32は、新しいバイナリ重みネットワークの最先端（SOTA）のトップ1精度（CIFAR-10で94.8％、ImageNetで74.03％）を設定するだけでなく、また、フル精度の同等品をそれぞれ1.78％と0.76％上回っています。さらに、MPT-1 / 1は、CIFAR-10のバイナリニューラルネットワークでSOTA Top-1の精度（91.9％）を達成します。コードと事前トレーニング済みモデルは、https：//github.com/chrundle/bipropで入手できます。

Recently, Frankle & Carbin (2019) demonstrated that randomly-initialized dense networks contain subnetworks that once found can be trained to reach test accuracy comparable to the trained dense network. However, finding these high performing trainable subnetworks is expensive, requiring iterative process of training and pruning weights. In this paper, we propose (and prove) a stronger Multi-Prize Lottery Ticket Hypothesis: A sufficiently over-parameterized neural network with random weights contains several subnetworks (winning tickets) that (a) have comparable accuracy to a dense target network with learned weights (prize 1), (b) do not require any further training to achieve prize 1 (prize 2), and (c) is robust to extreme forms of quantization (i.e., binary weights and/or activation) (prize 3). This provides a new paradigm for learning compact yet highly accurate binary neural networks simply by pruning and quantizing randomly weighted full precision neural networks. We also propose an algorithm for finding multi-prize tickets (MPTs) and test it by performing a series of experiments on CIFAR-10 and ImageNet datasets. Empirical results indicate that as models grow deeper and wider, multi-prize tickets start to reach similar (and sometimes even higher) test accuracy compared to their significantly larger and full-precision counterparts that have been weight-trained. Without ever updating the weight values, our MPTs-1/32 not only set new binary weight network state-of-the-art (SOTA) Top-1 accuracy -- 94.8% on CIFAR-10 and 74.03% on ImageNet -- but also outperform their full-precision counterparts by 1.78% and 0.76%, respectively. Further, our MPT-1/1 achieves SOTA Top-1 accuracy (91.9%) for binary neural networks on CIFAR-10. Code and pre-trained models are available at: https://github.com/chrundle/biprop.

updated: Wed Mar 17 2021 00:31:24 GMT+0000 (UTC)

published: Wed Mar 17 2021 00:31:24 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト