Instant Soup: Cheap Pruning Ensembles in A Single Pass Can Draw Lottery Tickets from Large Models

Ajay Jaiswal; Shiwei Liu; Tianlong Chen; Ying Ding; Zhangyang Wang

インスタントスープ: 1 回のパスで安価な剪定アンサンブルを使用すると、大型モデルから宝くじを引くことができます

事前トレーニング済みの大型トランスフォーマーは、微調整を通じて多数の下流アプリケーションに幅広く適応できるため、ここ数年で爆発的な注目を集めていますが、パラメータ数が急激に増加しているため、業界なしでトランスフォーマーを微調整するだけでも主なハードルになりつつあります。 -標準ハードウェア。最近、宝くじ仮説 (LTH) とその亜種は、高密度の対応物と同様のパフォーマンスを達成できるサブネットワークを生成するこれらの大規模な事前トレーニング済みモデルを枝刈りするために利用されていますが、LTH の実用主義は、反復的な完全トレーニングと反復枝刈りルーチンによって大幅に阻害されます。マグニチュードプルーニング (IMP) は、モデルサイズが大きくなるにつれて悪化します。複数のモデルの微調整された重みをより良い最小値にマージできることを示唆するモデルスープの最近の観察に動機付けられ、元の IMP コストの一部を使用して宝くじ品質のサブネットワークを生成するインスタントスーププルーニング (ISP) を提案します。これは、IMP の高価な中間枝刈り段階を、計算効率の高い弱いマスクの生成と集約ルーチンに置き換えることによって行われます。より具体的には、マスク生成段階で、ISP はさまざまなトレーニングプロトコルとデータサブセットを使用して少数の反復を行って、弱くノイズの多いサブネットワークを多数生成し、それらを重ね合わせてノイズを平均化し、高品質のノイズ除去されたサブネットワークを作成します。 2 つの人気のある大規模な事前トレーニング済みモデル、CLIP (現在まで枝刈りでは未調査) と複数のベンチマーク視覚データセットおよび言語データセットにわたる BERT での広範な実験とアブレーションにより、いくつかの最先端の枝刈り手法と比較して ISP の有効性が検証されています。。コードはhttps://github.com/VITA-Group/instant_soupで入手できます。

Large pre-trained transformers have been receiving explosive attention in the past few years, due to their wide adaptability for numerous downstream applications via fine-tuning, but their exponentially increasing parameter counts are becoming a primary hurdle to even just fine-tune them without industry-standard hardware. Recently, Lottery Ticket Hypothesis (LTH) and its variants, have been exploited to prune these large pre-trained models generating subnetworks that can achieve similar performance as their dense counterparts, but LTH pragmatism is enormously inhibited by repetitive full training and pruning routine of iterative magnitude pruning (IMP) which worsens with increasing model size. Motivated by the recent observations of model soups, which suggest that fine-tuned weights of multiple models can be merged to a better minima, we propose Instant Soup Pruning (ISP) to generate lottery ticket quality subnetworks, using a fraction of the original IMP cost by replacing the expensive intermediate pruning stages of IMP with computationally efficient weak mask generation and aggregation routine. More specifically, during the mask generation stage, ISP takes a small handful of iterations using varying training protocols and data subsets to generate many weak and noisy subnetworks, and superpose them to average out the noise creating a high-quality denoised subnetwork. Our extensive experiments and ablation on two popular large-scale pre-trained models: CLIP (unexplored in pruning till date) and BERT across multiple benchmark vision and language datasets validate the effectiveness of ISP compared to several state-of-the-art pruning methods. Codes are available at: https://github.com/VITA-Group/instant_soup

updated: Sun Jun 18 2023 03:09:52 GMT+0000 (UTC)

published: Sun Jun 18 2023 03:09:52 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト