Can We Scale Transformers to Predict Parameters of Diverse ImageNet Models?

Boris Knyazev; Doha Hwang; Simon Lacoste-Julien

さまざまな ImageNet モデルのパラメーターを予測するためにトランスフォーマーをスケーリングできますか?

大規模なデータセットでニューラルネットワークを事前トレーニングすることは、機械学習の基礎になりつつあり、大規模なリソースを備えた少数のコミュニティしか手の届かないところにあります。私たちは、事前トレーニングを民主化するという野心的な目標を目指しています。その目標に向けて、他のニューラルネットワークの高品質な ImageNet パラメーターを予測できる単一のニューラルネットワークをトレーニングしてリリースします。初期化に予測パラメーターを使用することで、PyTorch で利用可能なさまざまな ImageNet モデルのトレーニングを強化できます。他のデータセットに転送すると、予測されたパラメーターで初期化されたモデルもより速く収束し、競争力のある最終パフォーマンスに到達します。

Pretraining a neural network on a large dataset is becoming a cornerstone in machine learning that is within the reach of only a few communities with large-resources. We aim at an ambitious goal of democratizing pretraining. Towards that goal, we train and release a single neural network that can predict high quality ImageNet parameters of other neural networks. By using predicted parameters for initialization we are able to boost training of diverse ImageNet models available in PyTorch. When transferred to other datasets, models initialized with predicted parameters also converge faster and reach competitive final performance.

updated: Tue Mar 07 2023 18:56:59 GMT+0000 (UTC)

published: Tue Mar 07 2023 18:56:59 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト