Do We Actually Need Dense Over-Parameterization? In-Time Over-Parameterization in Sparse Training

Shiwei Liu; Lu Yin; Decebal Constantin Mocanu; Mykola Pechenizkiy

実際に高密度の過剰パラメータ化が必要ですか？スパーストレーニングにおける時間内の過剰パラメータ化

この論文では、スパーストレーニングでインタイムオーバーパラメータ化（ITOP）の概念を提案することにより、高価なオーバーパラメータ化を必要とせずに最先端のパフォーマンスが可能なディープニューラルネットワークのトレーニングに関する新しい視点を紹介します。。ランダムなスパースネットワークから開始し、トレーニング中にスパース接続を継続的に探索することで、時空間多様体でオーバーパラメーター化を実行し、スパーストレーニングと密トレーニングの間の表現力のギャップを埋めることができます。さらに、ITOPを使用して、動的スパーストレーニング（DST）の基本的なメカニズムを理解し、DSTの利点は、最適なスパース接続を検索するときに、考えられるすべてのパラメーターを経時的に考慮することができることから得られることを示します。トレーニング中に確実に調査された十分なパラメータがある限り、DSTは高密度ニューラルネットワークを大幅に上回ることができます。 ImageNet上のResNet-50を使用して、推測をサポートし、最先端のスパーストレーニングパフォーマンスを実現するための一連の実験を紹介します。さらに印象的なことに、私たちの方法は、極端なスパース性レベルで、パラメーター化に基づくスパース法よりも優れたパフォーマンスを実現します。 CIFAR-100でトレーニングすると、この方法は、極端なスパース性（98％）でも高密度モデルのパフォーマンスに匹敵することができます。コードはhttps://github.com/Shiweiliuiiiiiii/In-Time-Over-Parameterizationにあります。

In this paper, we introduce a new perspective on training deep neural networks capable of state-of-the-art performance without the need for the expensive over-parameterization by proposing the concept of In-Time Over-Parameterization (ITOP) in sparse training. By starting from a random sparse network and continuously exploring sparse connectivities during training, we can perform an Over-Parameterization in the space-time manifold, closing the gap in the expressibility between sparse training and dense training. We further use ITOP to understand the underlying mechanism of Dynamic Sparse Training (DST) and indicate that the benefits of DST come from its ability to consider across time all possible parameters when searching for the optimal sparse connectivity. As long as there are sufficient parameters that have been reliably explored during training, DST can outperform the dense neural network by a large margin. We present a series of experiments to support our conjecture and achieve the state-of-the-art sparse training performance with ResNet-50 on ImageNet. More impressively, our method achieves dominant performance over the overparameterization-based sparse methods at extreme sparsity levels. When trained on CIFAR-100, our method can match the performance of the dense model even at an extreme sparsity (98%). Code can be found https://github.com/Shiweiliuiiiiiii/In-Time-Over-Parameterization.

updated: Tue Jun 15 2021 05:01:46 GMT+0000 (UTC)

published: Thu Feb 04 2021 20:59:31 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト