Unleashing the Power of Visual Prompting At the Pixel Level

Junyang Wu; Xianhang Li; Chen Wei; Huiyu Wang; Alan Yuille; Yuyin Zhou; Cihang Xie

ピクセルレベルでビジュアルプロンプトの力を解き放つ

このホワイトペーパーでは、事前トレーニング済みのモデルを下流の認識タスクに適応させるための、シンプルで効果的な視覚的プロンプト方法を紹介します。私たちの方法には、2 つの主要な設計が含まれます。まず、プロンプトと画像を直接追加するのではなく、プロンプトを追加の独立した学習可能なコンポーネントとして扱います。プロンプトと画像を調整する戦略が重要であることを示し、適切に縮小された画像の周りにプロンプトをワープすることが経験的に最も効果的であることを発見しました。次に、譲渡可能な敵対的な例を構築する際に一般的に使用される 2 つの「古いトリック」、つまり、入力の多様性と勾配の正規化を視覚的プロンプトに再導入します。これらの手法は最適化を改善し、プロンプトをより一般化できるようにします。私たちの方法の有効性を実証するために、広範な実験結果を提供します。 CLIP モデルを使用して、私たちのプロンプティングメソッドは、12 の一般的な分類データセット全体で 82.8% の平均精度の新記録を打ち立て、従来技術を +5.6% 大幅に上回りました。このプロンプティングパフォーマンスは、既に線形プロービングよりも +2.1% 優れており、特定のデータセットでは完全な微調整に匹敵することさえあります。さらに、私たちのプロンプト方法は、さまざまなデータスケールにわたって、また分布の変化に対して競争力のあるパフォーマンスを示しています。コードは、https://github.com/UCSC-VLAA/EVP で公開されています。

This paper presents a simple and effective visual prompting method for adapting pre-trained models to downstream recognition tasks. Our method includes two key designs. First, rather than directly adding together the prompt and the image, we treat the prompt as an extra and independent learnable component. We show that the strategy of reconciling the prompt and the image matters, and find that warping the prompt around a properly shrinked image empirically works the best. Second, we re-introduce two "old tricks" commonly used in building transferable adversarial examples, i.e., input diversity and gradient normalization, into visual prompting. These techniques improve optimization and enable the prompt to generalize better. We provide extensive experimental results to demonstrate the effectiveness of our method. Using a CLIP model, our prompting method sets a new record of 82.8% average accuracy across 12 popular classification datasets, substantially surpassing the prior art by +5.6%. It is worth noting that this prompting performance already outperforms linear probing by +2.1% and can even match fully fine-tuning in certain datasets. In addition, our prompting method shows competitive performance across different data scales and against distribution shifts. The code is publicly available at https://github.com/UCSC-VLAA/EVP.

updated: Wed Mar 29 2023 06:49:51 GMT+0000 (UTC)

published: Tue Dec 20 2022 18:57:06 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト