One ticket to win them all: generalizing lottery ticket initializations across datasets and optimizers

Ari S. Morcos; Haonan Yu; Michela Paganini; Yuandong Tian

すべてを獲得する1つのチケット：データセットとオプティマイザー全体の宝くじチケットの初期化を一般化します

宝くじチケットの初期化の成功（Frankle and Carbin、2019）は、ネットワークが適切に初期化されている限り、小規模でスパース化されたネットワークをトレーニングできることを示唆しています。残念ながら、これらの「勝ちチケット」の初期化を見つけることは計算コストがかかります。考えられる解決策の1つは、さまざまなデータセットとオプティマイザーで同じ当選チケットを再利用することです。ただし、チケットの初期化の成功の一般性は不明のままです。ここでは、1つのトレーニング構成（オプティマイザーとデータセット）の獲得チケットを生成し、別の構成でのパフォーマンスを評価することにより、この質問に答えようとします。おそらく驚くべきことに、自然画像ドメイン内で、ファッションMNIST、SVHN、CIFAR-10 / 100、ImageNet、Places365などのさまざまなデータセット全体で一般化されたチケットの初期化を獲得し、多くの場合、生成されたチケットのパフォーマンスに近いパフォーマンスを達成することがわかりました同じデータセット上。さらに、大きなデータセットを使用して生成された当選チケットは、小さなデータセットを使用して生成されたチケットよりも一貫して良好に転送されました。また、チケットの初期化が成功すると、オプティマイザー全体で高いパフォーマンスが得られることがわかりました。これらの結果は、十分に大きなデータセットによって生成された勝ちチケット初期化には、ニューラルネットワークに一般的な帰納的バイアスが含まれているため、多くの設定でトレーニングが改善され、より良い初期化方法の開発への希望がもたらされることを示唆しています。

The success of lottery ticket initializations (Frankle and Carbin, 2019) suggests that small, sparsified networks can be trained so long as the network is initialized appropriately. Unfortunately, finding these "winning ticket" initializations is computationally expensive. One potential solution is to reuse the same winning tickets across a variety of datasets and optimizers. However, the generality of winning ticket initializations remains unclear. Here, we attempt to answer this question by generating winning tickets for one training configuration (optimizer and dataset) and evaluating their performance on another configuration. Perhaps surprisingly, we found that, within the natural images domain, winning ticket initializations generalized across a variety of datasets, including Fashion MNIST, SVHN, CIFAR-10/100, ImageNet, and Places365, often achieving performance close to that of winning tickets generated on the same dataset. Moreover, winning tickets generated using larger datasets consistently transferred better than those generated using smaller datasets. We also found that winning ticket initializations generalize across optimizers with high performance. These results suggest that winning ticket initializations generated by sufficiently large datasets contain inductive biases generic to neural networks more broadly which improve training across many settings and provide hope for the development of better initialization methods.

updated: Sun Oct 27 2019 18:10:27 GMT+0000 (UTC)

published: Thu Jun 06 2019 18:46:39 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト