Recurrent Parameter Generators

Jiayun Wang; Yubei Chen; Stella X. Yu; Brian Cheung; Yann LeCun

再帰パラメータジェネレータ

ディープラーニングは、ますます大規模なモデルをトレーニングし、その後、実際の展開のために圧縮することで大きな成功を収めました。コンパクトで最適な深層学習への劇的に異なるアプローチを提案します。自由度 (DoF) とモデルの実際のパラメーター数を分離し、任意のアーキテクチャの大規模モデルに対して事前定義されたランダム線形制約を使用して小さな DoF を最適化します。ワンステージエンドツーエンド学習。具体的には、リングからパラメーターを繰り返しフェッチし、それらをランダム順列と符号反転を使用して大きなモデルにアンパックして、パラメーターの無相関化を促進する再帰パラメータージェネレーター (RPG) を作成します。勾配降下法は、収束が速く、制約の下で最適なモデルを自動的に見つけることができることを示します。私たちの広範な実験により、モデル DoF と精度の間の対数線形関係が明らかになりました。私たちの RPG は DoF の大幅な削減を示しており、実行時のパフォーマンスをさらに向上させるために、さらに刈り込みと量子化を行うことができます。たとえば、ImageNet でのトップ 1 精度に関しては、RPG はわずか 18% の DoF (1 つの畳み込み層に相当) で ResNet18 のパフォーマンスの 96% を達成し、わずか 0.25% の DoF で ResNet34 のパフォーマンスの 52% を達成します!私たちの研究は、コンパクトで最適な深層学習における制約付きニューラル最適化の大きな可能性を示しています。

Deep learning has achieved tremendous success by training increasingly large models, which are then compressed for practical deployment. We propose a drastically different approach to compact and optimal deep learning: We decouple the Degrees of freedom (DoF) and the actual number of parameters of a model, optimize a small DoF with predefined random linear constraints for a large model of arbitrary architecture, in one-stage end-to-end learning. Specifically, we create a recurrent parameter generator (RPG), which repeatedly fetches parameters from a ring and unpacks them onto a large model with random permutation and sign flipping to promote parameter decorrelation. We show that gradient descent can automatically find the best model under constraints with faster convergence. Our extensive experimentation reveals a log-linear relationship between model DoF and accuracy. Our RPG demonstrates remarkable DoF reduction and can be further pruned and quantized for additional run-time performance gain. For example, in terms of top-1 accuracy on ImageNet, RPG achieves 96% of ResNet18's performance with only 18% DoF (the equivalent of one convolutional layer) and 52% of ResNet34's performance with only 0.25% DoF! Our work shows a significant potential of constrained neural optimization in compact and optimal deep learning.

updated: Fri Oct 21 2022 21:12:29 GMT+0000 (UTC)

published: Thu Jul 15 2021 04:23:59 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト