Parameter-efficient is not sufficient: Exploring Parameter, Memory, and Time Efficient Adapter Tuning for Dense Predictions

Dongshuo Yin; Xueting Han; Bin Li; Hao Feng; Jing Bai

パラメータ効率だけでは十分ではありません: 高密度予測のためのパラメータ、メモリ、時間効率の良いアダプタ調整の探索

事前トレーニングと微調整は、コンピュータービジョン (CV) において広く普及しているパラダイムです。最近、パラメータ効率の高い転移学習 (PETL) 手法は、トレーニング可能なパラメータがわずかしかない事前トレーニング済みモデルからの知識の伝達において有望なパフォーマンスを示しています。成功にもかかわらず、CV の既存の PETL 手法は計算コストが高く、トレーニング中に大量のメモリと時間コストを必要とするため、リソースの少ないユーザーが大規模なモデルで研究やアプリケーションを実行することが制限されます。この作業では、この問題に対処するために、パラメーター、メモリ、時間効率の高いビジュアルアダプター (E^3VA) のチューニングを提案します。低ランクのアダプターに勾配バックプロパゲーションハイウェイを提供します。これにより、凍結された事前トレーニングパラメーターに対する大規模な勾配計算が削除され、トレーニングメモリとトレーニング時間が大幅に節約されます。さらに、モデルのパフォーマンスを向上させるために、高密度予測タスク用に E^3VA 構造を最適化します。 COCO、ADE20K、および Pascal VOC ベンチマークに関する広範な実験では、E^3VA が平均で最大 62.2% のトレーニングメモリと 26.2% のトレーニング時間を節約できると同時に、完全な微調整と同等のパフォーマンスと、ほとんどの PETL 手法よりも優れたパフォーマンスを達成できることが示されています。 1.5% 未満のトレーニング可能なパラメーターを使用して、GTX 1080Ti GPU 上で Swin-Large ベースのカスケードマスク RCNN をトレーニングすることもできることに注意してください。

Pre-training & fine-tuning is a prevalent paradigm in computer vision (CV). Recently, parameter-efficient transfer learning (PETL) methods have shown promising performance in transferring knowledge from pre-trained models with only a few trainable parameters. Despite their success, the existing PETL methods in CV can be computationally expensive and require large amounts of memory and time cost during training, which limits low-resource users from conducting research and applications on large models. In this work, we propose Parameter, Memory, and Time Efficient Visual Adapter (E^3VA) tuning to address this issue. We provide a gradient backpropagation highway for low-rank adapters which removes large gradient computations for the frozen pre-trained parameters, resulting in substantial savings of training memory and training time. Furthermore, we optimise the E^3VA structure for dense predictions tasks to promote model performance. Extensive experiments on COCO, ADE20K, and Pascal VOC benchmarks show that E^3VA can save up to 62.2% training memory and 26.2% training time on average, while achieving comparable performance to full fine-tuning and better performance than most PETL methods. Note that we can even train the Swin-Large-based Cascade Mask RCNN on GTX 1080Ti GPUs with less than 1.5% trainable parameters.

updated: Fri Jun 16 2023 09:54:07 GMT+0000 (UTC)

published: Fri Jun 16 2023 09:54:07 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト