Sensitivity-Aware Visual Parameter-Efficient Tuning

Haoyu He; Jianfei Cai; Jing Zhang; Dacheng Tao; Bohan Zhuang

感度を意識した視覚的パラメータの効率的なチューニング

Visual Parameter-Efficient Tuning (VPET) は、事前にトレーニングされたビジョンモデルをダウンストリームタスクに適応させるために、完全な微調整の強力な代替手段になりました。最適化の難しさ。ただし、既存の VPET メソッドは、人間のヒューリスティックのみに依存して、さまざまなタスク間で同じ位置にトレーニング可能なパラメーターを導入し、ドメインギャップを無視します。この目的のために、トレーニング可能なパラメーターをタスク固有の重要な位置に適応的に割り当てる新しい感度認識視覚的パラメーター効率的チューニング (SPT) スキームを提案することにより、トレーニング可能なパラメーターを導入する場所と割り当てる方法を研究します。具体的には、SPT はまず、データに依存する方法で特定のタスクの調整が必要な機密パラメーターをすばやく識別します。次に、SPT は、LoRA やアダプターなどの既存の構造化された調整方法のいずれかを利用して、選択された機密パラメーター (構造化されていないパラメーター) を直接調整する代わりに、機密パラメーターの数が事前定義されたしきい値を超える重み行列の表現能力をさらに高めます。チューニング) 予算の下で。幅広いダウンストリーム認識タスクに関する広範な実験により、当社の SPT が既存の VPET メソッドを補完し、そのパフォーマンスを大幅に向上させることが示されています。たとえば、SPT は、教師付きの事前トレーニング済み ViT-B/16 バックボーンを使用してアダプターを 4.2% および平均 1.4% 改善します。トップ 1 の精度で、FGVC および VTAB-1k ベンチマークでそれぞれ SOTA パフォーマンスに達しています。ソースコードは https://github.com/ziplab/SPT にあります

Visual Parameter-Efficient Tuning (VPET) has become a powerful alternative for full fine-tuning so as to adapt pre-trained vision models to downstream tasks, which only tunes a small number of parameters while freezing the vast majority ones to ease storage burden and optimization difficulty. However, existing VPET methods introduce trainable parameters to the same positions across different tasks depending solely on human heuristics and neglect the domain gaps. To this end, we study where to introduce and how to allocate trainable parameters by proposing a novel Sensitivity-aware visual Parameter-efficient Tuning (SPT) scheme, which adaptively allocates trainable parameters to task-specific important positions given a desired tunable parameter budget. Specifically, our SPT first quickly identifies the sensitive parameters that require tuning for a given task in a data-dependent way. Next, our SPT further boosts the representational capability for the weight matrices whose number of sensitive parameters exceeds a pre-defined threshold by utilizing any of the existing structured tuning methods, e.g., LoRA or Adapter, to replace directly tuning the selected sensitive parameters (unstructured tuning) under the budget. Extensive experiments on a wide range of downstream recognition tasks show that our SPT is complementary to the existing VPET methods and largely boosts their performance, e.g., SPT improves Adapter with supervised pre-trained ViT-B/16 backbone by 4.2% and 1.4% mean Top-1 accuracy, reaching SOTA performance on FGVC and VTAB-1k benchmarks, respectively. Source code is at https://github.com/ziplab/SPT

updated: Wed Mar 15 2023 12:34:24 GMT+0000 (UTC)

published: Wed Mar 15 2023 12:34:24 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト