Parameter-efficient Model Adaptation for Vision Transformers

Xuehai He; Chunyuan Li; Pengchuan Zhang; Jianwei Yang; Xin Eric Wang

ビジョントランスフォーマー向けのパラメーター効率の高いモデル適応

コンピュータービジョンでは、大規模な事前トレーニング済みビジョンモデル (ビジョントランスフォーマーなど) をダウンストリームタスクに適応させることで、優れた転移学習パフォーマンスを達成しました。モデル適応の一般的なアプローチでは、すべてのモデルパラメーターを更新するか、線形プローブを利用します。この論文では、画像分類タスクでのビジョントランスフォーマーのパラメーター効率の高いモデル適応戦略を研究することを目的としています。部分空間トレーニング問題として効率的なモデル適応を定式化し、さまざまな効率的な適応方法で包括的なベンチマークを実行します。パラメータコストとともにそのパフォーマンスに焦点を当てた、各効率的なモデル適応方法に関する実証的研究を実施します。さらに、パラメーター効率の高いモデル適応フレームワークを提案します。これは、最初にローカル固有次元を測定することによってサブモジュールを選択し、次にそれらをサブスペースに投影して、新しいクロネッカー適応 (KAdaptation) メソッドを介してさらに分解します。私たちの方法を分析し、さまざまなベースラインモデル適応方法のセット (事前トレーニング済み言語モデルの最先端の方法を含む) と比較します。私たちの方法は、少数ショット設定での 20 個の画像分類データセットとフルショット設定での 7 個の画像分類データセットにわたって、精度とパラメーター効率の間のトレードオフの点で最高のパフォーマンスを発揮します。

In computer vision, it has achieved great transfer learning performance via adapting large-scale pretrained vision models (e.g., vision transformers) to downstream tasks. Common approaches for model adaptation either update all model parameters or leverage linear probes. In this paper, we aim to study parameter-efficient model adaptation strategies for vision transformers on the image classification task. We formulate efficient model adaptation as a subspace training problem and perform a comprehensive benchmarking over different efficient adaptation methods. We conduct an empirical study on each efficient model adaptation method focusing on its performance alongside parameter cost. Furthermore, we propose a parameter-efficient model adaptation framework, which first selects submodules by measuring local intrinsic dimensions and then projects them into subspace for further decomposition via a novel Kronecker Adaptation (KAdaptation) method. We analyze and compare our method with a diverse set of baseline model adaptation methods (including state-of-the-art methods for pretrained language models). Our method performs the best in terms of the tradeoff between accuracy and parameter efficiency across 20 image classification datasets under the few-shot setting and 7 image classification datasets under the full-shot setting.

updated: Thu Jul 13 2023 22:12:10 GMT+0000 (UTC)

published: Tue Mar 29 2022 05:30:09 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト