The Elastic Lottery Ticket Hypothesis

Xiaohan Chen; Yu Cheng; Shuohang Wang; Zhe Gan; Jingjing Liu; Zhangyang Wang

弾力性のある宝くじの仮説

宝くじの仮説（LTH）は、完全なモデルと同等またはそれ以上のパフォーマンスを達成するために個別にトレーニングできる、まばらなトレーニング可能なサブネットワークの特定、またはチケットの獲得に強い注目を集めています。多くの努力がなされているにもかかわらず、そのような当選チケットを識別する最も効果的な方法は、依然として反復マグニチュードベースのプルーニング（IMP）です。これは計算コストが高く、さまざまなネットワークごとに徹底的に実行する必要があります。当然の質問は、あるネットワークで見つかった当選チケットを異なるアーキテクチャの別のネットワークに「変換」して、高価なIMPをやり直すことなく、最初に後者の当選チケットを生成できるかどうかです。この質問に答えることは、効率的な「一度限り」の当選チケットの検索に実際に関連するだけでなく、ネットワーク内の本質的にスケーラブルなスパースパターンを明らかにすることにも理論的に魅力的です。 CIFAR-10とImageNetで広範な実験を行い、同じモデルファミリ（ResNetsなど）のさまざまなネットワークから見つかった当選チケットを微調整するためのさまざまな戦略を提案します。これらの結果に基づいて、Elastic Lottery Ticket Hypothesis（E-LTH）を明確にします。あるネットワークのレイヤーを注意深く複製（またはドロップ）して並べ替えることで、対応する当選チケットを別のサブネットワークのサブネットワークに引き伸ばす（または絞る）ことができます。同じファミリのより深い（またはより浅い）ネットワーク。そのパフォーマンスは、IMPによって直接検出された後者の当選チケットとほぼ同じ競争力があります。また、E-LTHを初期化時の剪定および動的スパーストレーニング方法と広範囲に比較し、さまざまなモデルファミリ、レイヤータイプ、およびデータセット全体に対するE-LTHの一般化可能性についても説明しました。コードはhttps://github.com/VITA-Group/ElasticLTHで入手できます。

Lottery Ticket Hypothesis (LTH) raises keen attention to identifying sparse trainable subnetworks, or winning tickets, which can be trained in isolation to achieve similar or even better performance compared to the full models. Despite many efforts being made, the most effective method to identify such winning tickets is still Iterative Magnitude-based Pruning (IMP), which is computationally expensive and has to be run thoroughly for every different network. A natural question that comes in is: can we "transform" the winning ticket found in one network to another with a different architecture, yielding a winning ticket for the latter at the beginning, without re-doing the expensive IMP? Answering this question is not only practically relevant for efficient "once-for-all" winning ticket finding, but also theoretically appealing for uncovering inherently scalable sparse patterns in networks. We conduct extensive experiments on CIFAR-10 and ImageNet, and propose a variety of strategies to tweak the winning tickets found from different networks of the same model family (e.g., ResNets). Based on these results, we articulate the Elastic Lottery Ticket Hypothesis (E-LTH): by mindfully replicating (or dropping) and re-ordering layers for one network, its corresponding winning ticket could be stretched (or squeezed) into a subnetwork for another deeper (or shallower) network from the same family, whose performance is nearly the same competitive as the latter's winning ticket directly found by IMP. We have also extensively compared E-LTH with pruning-at-initialization and dynamic sparse training methods, as well as discussed the generalizability of E-LTH to different model families, layer types, and across datasets. Code is available at https://github.com/VITA-Group/ElasticLTH.

updated: Thu Oct 28 2021 01:56:04 GMT+0000 (UTC)

published: Tue Mar 30 2021 17:53:45 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト