Scaling & Shifting Your Features: A New Baseline for Efficient Model Tuning

Dongze Lian; Daquan Zhou; Jiashi Feng; Xinchao Wang

機能のスケーリングとシフト: 効率的なモデルチューニングのための新しいベースライン

既存の微調整方法は、事前にトレーニングされたモデルのすべてのパラメーターを調整する (完全な微調整) か、効率的ではないか、最後の線形レイヤーのみを調整する (線形プローブ) ため、完全な微調整に比べて精度が大幅に低下します。 -チューニング。この論文では、SSF と呼ばれる新しいパラメーター効率の良い微調整方法を提案します。これは、研究者が完全な微調整のパフォーマンスに追いつくために、事前にトレーニングされたモデルによって抽出された深い特徴をスケーリングおよびシフトするだけでよいことを表しています。このように、SSF は驚くべきことに、調整可能なパラメーターの数が少ない場合でも、他のパラメーター効率の良い微調整アプローチよりも優れています。さらに、トレーニング段階と推論段階で余分なパラメーターと計算コストを導入する一部の既存のパラメーター効率の良い微調整方法 (Adapter や VPT など) とは異なり、SSF はトレーニング段階で学習可能なパラメーターのみを追加し、これらの追加パラメーターは推論フェーズでの再パラメータ化により、元の事前トレーニング済みモデルの重みにマージされます。提案された SSF を使用すると、フルファインと比較して、トップ 1 精度に関して、モデルは FGVC および VTAB-1k で 2.46% (90.72% 対 88.54%) および 11.48% (73.10% 対 65.57%) のパフォーマンス向上を実現します。調整中ですが、約 0.3M パラメータの微調整のみです。また、さまざまなモデルファミリー (CNN、トランスフォーマー、MLP) とデータセットで大量の実験を行っています。合計 26 の画像分類データセットと 3 つのロバスト性および分布外データセットの結果は、SSF の有効性を示しています。コードは https://github.com/dongzelian/SSF で入手できます。

Existing fine-tuning methods either tune all parameters of the pre-trained model (full fine-tuning), which is not efficient, or only tune the last linear layer (linear probing), which suffers a significant accuracy drop compared to the full fine-tuning. In this paper, we propose a new parameter-efficient fine-tuning method termed as SSF, representing that researchers only need to Scale and Shift the deep Features extracted by a pre-trained model to catch up with the performance of full fine-tuning. In this way, SSF also surprisingly outperforms other parameter-efficient fine-tuning approaches even with a smaller number of tunable parameters. Furthermore, different from some existing parameter-efficient fine-tuning methods (e.g., Adapter or VPT) that introduce the extra parameters and computational cost in the training and inference stages, SSF only adds learnable parameters during the training stage, and these additional parameters can be merged into the original pre-trained model weights via re-parameterization in the inference phase. With the proposed SSF, our model obtains 2.46% (90.72% vs. 88.54%) and 11.48% (73.10% vs. 65.57%) performance improvement on FGVC and VTAB-1k in terms of Top-1 accuracy compared to the full fine-tuning but only fine-tuning about 0.3M parameters. We also conduct amounts of experiments in various model families (CNNs, Transformers, and MLPs) and datasets. Results on 26 image classification datasets in total and 3 robustness & out-of-distribution datasets show the effectiveness of SSF. Code is available at https://github.com/dongzelian/SSF.

updated: Sun Jan 15 2023 10:31:33 GMT+0000 (UTC)

published: Mon Oct 17 2022 08:14:49 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト