Dynamic Sparse Training via Balancing the Exploration-Exploitation Trade-off

Shaoyi Huang; Bowen Lei; Dongkuan Xu; Hongwu Peng; Yue Sun; Mimi Xie; Caiwen Ding

探索と搾取のトレードオフのバランスによる動的スパーストレーニング

ディープニューラルネットワーク (DNN) の過剰パラメーター化により、多くのアプリケーションで高い予測精度が示されています。効果的ではありますが、パラメーターの数が多いため、リソースが限られたデバイスでの人気が妨げられ、環境への影響が非常に大きくなります。スパーストレーニング (反復ごとにゼロ以外の固定数の重みを使用) は、モデルサイズを縮小することでトレーニングコストを大幅に軽減できます。ただし、既存のスパーストレーニング方法は、主にランダムベースまたは貪欲ベースのドロップアンドグロー戦略のいずれかを使用するため、局所的な最小精度と低い精度が得られます。この作業では、動的スパーストレーニングをスパースコネクティビティ検索問題と見なし、ローカルオプティマポイントとサドルポイントから逃れるための活用と探索の取得関数を設計します。さらに取得関数を設計し、提案手法の理論的保証を提供し、その収束特性を明らかにします。実験結果は、提案された方法によって得られたスパースモデル (最大 98% のスパース性) が、さまざまなディープラーニングタスクで SOTA スパーストレーニングメソッドよりも優れていることを示しています。 VGG-19 / CIFAR-100、ResNet-50 / CIFAR-10、ResNet-50 / CIFAR-100 では、高密度モデルよりも精度が高くなります。 ResNet-50 / ImageNet では、提案された方法は SOTA スパーストレーニング方法と比較して最大 8.2% 精度が向上します。

Over-parameterization of deep neural networks (DNNs) has shown high prediction accuracy for many applications. Although effective, the large number of parameters hinders its popularity on resource-limited devices and has an outsize environmental impact. Sparse training (using a fixed number of nonzero weights in each iteration) could significantly mitigate the training costs by reducing the model size. However, existing sparse training methods mainly use either random-based or greedy-based drop-and-grow strategies, resulting in local minimal and low accuracy. In this work, we consider the dynamic sparse training as a sparse connectivity search problem and design an exploitation and exploration acquisition function to escape from local optima and saddle points. We further design an acquisition function and provide the theoretical guarantees for the proposed method and clarify its convergence property. Experimental results show that sparse models (up to 98% sparsity) obtained by our proposed method outperform the SOTA sparse training methods on a wide variety of deep learning tasks. On VGG-19 / CIFAR-100, ResNet-50 / CIFAR-10, ResNet-50 / CIFAR-100, our method has even higher accuracy than dense models. On ResNet-50 / ImageNet, the proposed method has up to 8.2% accuracy improvement compared to SOTA sparse training methods.

updated: Mon Apr 24 2023 04:24:07 GMT+0000 (UTC)

published: Wed Nov 30 2022 01:22:25 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト