Keep the Gradients Flowing: Using Gradient Flow to Study Sparse Network Optimization

Kale-ab Tessera; Sara Hooker; Benjamin Rosman

勾配の流れを維持する：勾配流を使用してスパースネットワークの最適化を研究する

密なニューラルアーキテクチャと同じパフォーマンスに収束するように疎なネットワークをトレーニングすることは、とらえどころのないことが証明されています。最近の研究は、初期化が鍵であることを示唆しています。しかし、この研究の方向性はある程度の成功を収めていますが、初期化だけに焦点を当てるだけでは不十分であるように思われます。このホワイトペーパーでは、スパースネットワークのトレーニングについてより広い視野を持ち、スパースモデルでの正則化、最適化、およびアーキテクチャの選択の役割について検討します。単純な実験フレームワークであるSameCapacity Sparse vs Dense Comparison（SC-SDC）を提案します。これにより、スパースネットワークとデンスネットワークの公正な比較が可能になります。さらに、勾配流の新しい尺度である有効勾配流（EGF）を提案します。これは、スパースネットワークのパフォーマンスとの相関性が高くなります。トップラインメトリック、SC-SDCおよびEGFを使用して、密なネットワークに使用されるオプティマイザー、アクティベーション関数、およびレギュラライザーのデフォルトの選択が、疎なネットワークに不利になる可能性があることを示します。これらの調査結果に基づいて、アーキテクチャ設計とトレーニング体制の側面を再検討することにより、スパースネットワークの勾配フローを改善できることを示します。私たちの仕事は、初期化はパズルのほんの一部であり、まばらなネットワークに最適化を調整することのより広い視野をとることは有望な結果をもたらすことを示唆しています。

Training sparse networks to converge to the same performance as dense neural architectures has proven to be elusive. Recent work suggests that initialization is the key. However, while this direction of research has had some success, focusing on initialization alone appears to be inadequate. In this paper, we take a broader view of training sparse networks and consider the role of regularization, optimization, and architecture choices on sparse models. We propose a simple experimental framework, Same Capacity Sparse vs Dense Comparison (SC-SDC), that allows for a fair comparison of sparse and dense networks. Furthermore, we propose a new measure of gradient flow, Effective Gradient Flow (EGF), that better correlates to performance in sparse networks. Using top-line metrics, SC-SDC and EGF, we show that default choices of optimizers, activation functions and regularizers used for dense networks can disadvantage sparse networks. Based upon these findings, we show that gradient flow in sparse networks can be improved by reconsidering aspects of the architecture design and the training regime. Our work suggests that initialization is only one piece of the puzzle and taking a wider view of tailoring optimization to sparse networks yields promising results.

updated: Wed Jun 16 2021 02:49:34 GMT+0000 (UTC)

published: Tue Feb 02 2021 18:40:26 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト