Lottery Jackpots Exist in Pre-trained Models

Yuxin Zhang; Mingbao Lin; Yunshan Zhong; Fei Chao; Rongrong Ji

宝くじのジャックポットは事前トレーニング済みモデルに存在する

ネットワークプルーニングは、ネットワークの複雑さを軽減し、パフォーマンスの低下を許容できる効果的なアプローチです。既存の研究では、時間のかかるウェイトトレーニングまたは拡張された幅を持つネットワークでの複雑な検索によって、ニューラルネットワークのスパース性を実現しています。この論文では、「宝くじジャックポット」と呼ばれる、ウェイトトレーニングを伴わない高性能でまばらなサブネットワークが、幅が拡張されていない事前トレーニング済みモデルに存在することを示します。さらに、宝くじの当選確率を2つの観点から効率化。まず、多くの既存の刈り込み基準から導出されたスパースマスクが、宝くじのジャックポットの検索済みマスクと高い重複があることを観察します。その中で、マグニチュードベースの刈り込みは、私たちのものと最も類似したマスクをもたらします。その結果、検索された宝くじジャックポットは、ResNet-50 で 90% の重みを取り除きますが、ImageNet で 5 回の検索エポックだけを使用して、70% 以上のトップ 1 精度を簡単に取得します。この洞察に従って、マグニチュードベースのプルーニングを使用してスパースマスクを初期化し、宝くじジャックポット検索のコストを少なくとも 3 倍削減しながら、同等またはそれ以上のパフォーマンスを達成します。次に、宝くじのジャックポットの検索プロセスを詳細に分析します。私たちの理論的な結果は、重み検索中のトレーニング損失の減少が、最新のネットワークの重み間の依存関係によって妨げられる可能性があることを示唆しています。これを軽減するために、トレーニングの損失に悪影響を与える可能性のあるマスクの変更を制限するための新しい短い制限方法を提案します。私たちのコードは https://github.com/zyxxmu/lottery-jackpots で入手できます。

Network pruning is an effective approach to reduce network complexity with acceptable performance compromise. Existing studies achieve the sparsity of neural networks via time-consuming weight training or complex searching on networks with expanded width, which greatly limits the applications of network pruning. In this paper, we show that high-performing and sparse sub-networks without the involvement of weight training, termed "lottery jackpots", exist in pre-trained models with unexpanded width. Furthermore, we improve the efficiency for searching lottery jackpots from two perspectives. Firstly, we observe that the sparse masks derived from many existing pruning criteria have a high overlap with the searched mask of our lottery jackpot, among which, the magnitude-based pruning results in the most similar mask with ours. Consequently, our searched lottery jackpot removes 90% weights in ResNet-50, while it easily obtains more than 70% top-1 accuracy using only 5 searching epochs on ImageNet. In compliance with this insight, we initialize our sparse mask using the magnitude-based pruning, resulting in at least 3x cost reduction on the lottery jackpot searching while achieving comparable or even better performance. Secondly, we conduct an in-depth analysis of the searching process for lottery jackpots. Our theoretical result suggests that the decrease in training loss during weight searching can be disturbed by the dependency between weights in modern networks. To mitigate this, we propose a novel short restriction method to restrict change of masks that may have potential negative impacts on the training loss. Our code is available at https://github.com/zyxxmu/lottery-jackpots.

updated: Sat Sep 02 2023 05:09:41 GMT+0000 (UTC)

published: Sun Apr 18 2021 03:50:28 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト