Towards Efficient Visual Adaption via Structural Re-parameterization

Gen Luo; Minglang Huang; Yiyi Zhou; Xiaoshuai Sun; Guannan Jiang; Zhiyu Wang; Rongrong Ji

構造の再パラメータ化による効率的な視覚適応に向けて

パラメーター効率の高い転移学習 (PETL) は、大規模な事前トレーニング済みモデルを下流のタスクに安価に適応させることを目的とした新しい研究スポットです。最近の進歩により、完全な調整ではなく少数のパラメーターを更新することで、さまざまな事前トレーニング済みモデルのストレージコストを節約することに大きな成功を収めています。ただし、ほとんどの既存の PETL メソッドでは、推論中に無視できないレイテンシが発生することに気付きました。このホワイトペーパーでは、RepAdapter と呼ばれる、巨大なビジョンモデル用のパラメーター効率が高く、計算しやすいアダプターを提案します。具体的には、最初に、構造的な再パラメータ化を介して、共通の適応モジュールをほとんどの巨大なビジョンモデルにシームレスに統合できることを証明し、それによって推論中のコストをゼロにします。次に、疎な設計とアダプター構造の効果的な配置を調査し、RepAdaper がパラメーターの効率とパフォーマンスの点で他の利点を得るのを助けます。 RepAdapter を検証するために、3 つのビジョンタスク、つまり画像とビデオの分類、セマンティックセグメンテーションの 27 のベンチマークデータセットで広範な実験を行います。実験結果は、最先端の PETL メソッドよりも RepAdapter の優れたパフォーマンスと効率を示しています。たとえば、RepAdapter はフルチューニングよりも平均で +7.2% 優れており、トレーニング時間は最大 25%、GPU メモリは 20%、VTAB-1k での ViT-B/16 のストレージコストは 94.6% 節約できます。 RepAdapter の一般化機能は、多数のビジョンモデルによっても十分に検証されています。ソースコードは https://github.com/luogen1996/RepAdapter でリリースされています。

Parameter-efficient transfer learning (PETL) is an emerging research spot aimed at inexpensively adapting large-scale pre-trained models to downstream tasks. Recent advances have achieved great success in saving storage costs for various pre-trained models by updating a small number of parameters instead of full tuning. However, we notice that most existing PETL methods still incur non-negligible latency during inference. In this paper, we propose a parameter-efficient and computational friendly adapter for giant vision models, called RepAdapter. Specifically, we first prove that common adaptation modules can also be seamlessly integrated into most giant vision models via our structural re-parameterization, thereby achieving zero-cost during inference. We then investigate the sparse design and effective placement of adapter structure, helping our RepAdaper obtain other advantages in terms of parameter efficiency and performance. To validate RepAdapter, we conduct extensive experiments on 27 benchmark datasets of three vision tasks, i.e., image and video classifications and semantic segmentation. Experimental results show the superior performance and efficiency of RepAdapter than the state-of-the-art PETL methods. For instance, RepAdapter outperforms full tuning by +7.2% on average and saves up to 25% training time, 20% GPU memory, and 94.6% storage cost of ViT-B/16 on VTAB-1k. The generalization ability of RepAdapter is also well validated by a bunch of vision models. Our source code is released at https://github.com/luogen1996/RepAdapter.

updated: Tue Mar 21 2023 02:51:27 GMT+0000 (UTC)

published: Thu Feb 16 2023 06:14:15 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト