The Elastic Lottery Ticket Hypothesis

Xiaohan Chen; Yu Cheng; Shuohang Wang; Zhe Gan; Jingjing Liu; Zhangyang Wang

弾力性のある宝くじチケットの仮説

宝くじチケット仮説 (LTH) は、トレーニングのまばらなトレーニング可能なサブネットワーク、つまりトレーニングの勝ちチケットを特定することに強い関心を寄せています。多くの努力がなされていますが、このような当選チケットを特定する最も効果的な方法は、依然として反復マグニチュードベースプルーニング (IMP) です。当然出てくる質問は、あるネットワークで見つかった勝者チケットを異なるアーキテクチャを持つ別のネットワークに「変換」し、高価な IMP をやり直すことなく、最初に後者の勝者チケットを生成できるかということです。この質問への回答は、効率的な「一度きりの」当選チケット検索に実際に関連するだけでなく、ネットワーク内の本質的にスケーラブルなスパースパターンを明らかにすることにも理論的に魅力的です。私たちは、CIFAR-10 と ImageNet で広範な実験を行い、同じモデルファミリ (たとえば、ResNets) の異なるネットワークから見つかった当選チケットを微調整するためのさまざまな戦略を提案します。これらの結果に基づいて、Elastic Lottery Ticket Hypothesis (E-LTH) を明確にします: 1 つのネットワークのレイヤーを念入りに複製 (またはドロップ) して再順序付けすることにより、対応する当選チケットを別のネットワークのサブネットワークに拡張 (または圧縮) することができます。同じファミリーからのより深い (またはより浅い) ネットワークのパフォーマンスは、IMP が直接見つけた後者の当選チケットとほぼ同じ競争力です。また、E-LTH と初期化時のプルーニングおよび動的スパーストレーニング方法を徹底的に比較し、さまざまなモデルファミリ、レイヤータイプ、またはデータセット間での E-LTH の一般化可能性について説明します。コードは https://github.com/VITA-Group/ElasticLTH で入手できます。

Lottery Ticket Hypothesis (LTH) raises keen attention to identifying sparse trainable subnetworks, or winning tickets, of training, which can be trained in isolation to achieve similar or even better performance compared to the full models. Despite many efforts being made, the most effective method to identify such winning tickets is still Iterative Magnitude-based Pruning (IMP), which is computationally expensive and has to be run thoroughly for every different network. A natural question that comes in is: can we "transform" the winning ticket found in one network to another with a different architecture, yielding a winning ticket for the latter at the beginning, without re-doing the expensive IMP? Answering this question is not only practically relevant for efficient "once-for-all" winning ticket finding, but also theoretically appealing for uncovering inherently scalable sparse patterns in networks. We conduct extensive experiments on CIFAR-10 and ImageNet, and propose a variety of strategies to tweak the winning tickets found from different networks of the same model family (e.g., ResNets). Based on these results, we articulate the Elastic Lottery Ticket Hypothesis (E-LTH): by mindfully replicating (or dropping) and re-ordering layers for one network, its corresponding winning ticket could be stretched (or squeezed) into a subnetwork for another deeper (or shallower) network from the same family, whose performance is nearly the same competitive as the latter's winning ticket directly found by IMP. We have also thoroughly compared E-LTH with pruning-at-initialization and dynamic sparse training methods, and discuss the generalizability of E-LTH to different model families, layer types, or across datasets. Code is available at https://github.com/VITA-Group/ElasticLTH.

updated: Mon Jun 07 2021 18:04:37 GMT+0000 (UTC)

published: Tue Mar 30 2021 17:53:45 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト