Spending Your Winning Lottery Better After Drawing It

Ajay Kumar Jaiswal; Haoyu Ma; Tianlong Chen; Ying Ding; Zhangyang Wang

抽選後、当選した宝くじをよりよく使う

宝くじチケット仮説（LTH）は、密なニューラルネットワークには、最初から分離してトレーニングした場合に元の密なネットワークのパフォーマンスに匹敵する疎なサブネットワークが含まれていることを示唆しています。ほとんどの作品は、初期化、アーキテクチャブロック、トレーニングレシピなど、密なネットワークと同じトレーニングプロトコルを使用して、疎なサブネットワークを再トレーニングします。ただし、これまで、これらのトレーニングプロトコルがスパースネットワークに最適であるかどうかは不明です。この論文では、予備の再トレーニングが密なネットワークからこれらのプロパティを厳密に継承する必要がないことを示します。代わりに、スパースサブネットワークアーキテクチャまたはそのトレーニングレシピの意図的な「微調整」をプラグインすることにより、特に高スパース性レベルで、その再トレーニングをデフォルトよりも大幅に改善できます。提案されたすべての「微調整」を組み合わせると、LTHの新しい最先端のパフォーマンスが得られ、これらの変更は、一般的な他のスパーストレーニングアルゴリズムに簡単に適合させることができます。具体的には、バニラ-LTHよりもCIFAR-100のResNet18で1.05％〜4.93％の大幅で一貫したパフォーマンスの向上を達成しました。さらに、私たちのメソッドは、データセット（CIFAR10、CIFAR100、TinyImageNet）とアーキテクチャ（Vgg16、ResNet-18 / ResNet-34、MobileNet）全体で一般化することが示されています。すべてのコードは公開されます。

Lottery Ticket Hypothesis (LTH) suggests that a dense neural network contains a sparse sub-network that can match the performance of the original dense network when trained in isolation from scratch. Most works retrain the sparse sub-network with the same training protocols as its dense network, such as initialization, architecture blocks, and training recipes. However, till now it is unclear that whether these training protocols are optimal for sparse networks. In this paper, we demonstrate that it is unnecessary for spare retraining to strictly inherit those properties from the dense network. Instead, by plugging in purposeful "tweaks" of the sparse subnetwork architecture or its training recipe, its retraining can be significantly improved than the default, especially at high sparsity levels. Combining all our proposed "tweaks" can yield the new state-of-the-art performance of LTH, and these modifications can be easily adapted to other sparse training algorithms in general. Specifically, we have achieved a significant and consistent performance gain of1.05% - 4.93% for ResNet18 on CIFAR-100 over vanilla-LTH. Moreover, our methods are shown to generalize across datasets (CIFAR10, CIFAR100, TinyImageNet) and architectures (Vgg16, ResNet-18/ResNet-34, MobileNet). All codes will be publicly available.

updated: Mon Oct 11 2021 04:57:07 GMT+0000 (UTC)

published: Fri Jan 08 2021 23:33:53 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト