Learning to Compose Soft Prompts for Compositional Zero-Shot Learning

Nihal V. Nayak; Peilin Yu; Stephen H. Bach

合成ゼロショット学習のためのソフトプロンプトの作成方法の学習

CLIP のような大規模な事前トレーニング済みの視覚言語モデル (VLM) のゼロショット構成性を改善するためのパラメーター効率の高い学習手法である、合成ソフトプロンプト (CSP) を紹介します。目に見えない属性オブジェクトの構成 (例: 古い猫と若いトラ) を予測するタスクである、構成的なゼロショット学習用の CSP を開発します。 VLM には、任意のクラスを自然言語プロンプトとして表すことができる柔軟なテキストエンコーダーがありますが、構成上のゼロショットベンチマークデータセットでは、タスク固有のアーキテクチャのパフォーマンスが低下することがよくあります。 CSP は、クラスを定義する属性とオブジェクトを語彙の学習可能なトークンとして扱います。トレーニング中、複数の方法でトークンを構成するクラス (例: old cat と white cat) を認識するようにボキャブラリが調整されます。テスト時に、学習した属性オブジェクト語彙を新しい組み合わせで再構成して、新しいクラスを認識します。 CSP は、ベンチマークデータセットで CLIP よりも AUC で平均 10.9 パーセント優れていることを示しています。また、CSP は、プレフィックスコンテキストトークンを微調整するソフトプロンプト方式である CoOp よりも、AUC で平均 5.8% 優れています。追加の実験を行って、CSP が高次の属性-属性-オブジェクト構成 (例: 古い白猫) および事前トレーニング済みの属性と微調整されたオブジェクトの組み合わせへの一般化を改善することを示します。コードは https://github.com/BatsResearch/csp で入手できます。

We introduce compositional soft prompting (CSP), a parameter-efficient learning technique to improve the zero-shot compositionality of large-scale pretrained vision-language models (VLMs) like CLIP. We develop CSP for compositional zero-shot learning, the task of predicting unseen attribute-object compositions (e.g., old cat and young tiger). VLMs have a flexible text encoder that can represent arbitrary classes as natural language prompts but they often underperform task-specific architectures on the compositional zero-shot benchmark datasets. CSP treats the attributes and objects that define classes as learnable tokens of vocabulary. During training, the vocabulary is tuned to recognize classes that compose tokens in multiple ways (e.g., old cat and white cat). At test time, we recompose the learned attribute-object vocabulary in new combinations to recognize novel classes. We show that CSP outperforms the CLIP on benchmark datasets by an average of 10.9 percentage points on AUC. CSP also outperforms CoOp, a soft prompting method that fine-tunes the prefix context tokens, by an average of 5.8 percentage points on AUC. We perform additional experiments to show that CSP improves generalization to higher-order attribute-attribute-object compositions (e.g., old white cat) and combinations of pretrained attributes and fine-tuned objects. The code is available at https://github.com/BatsResearch/csp.

updated: Mon Apr 24 2023 15:46:28 GMT+0000 (UTC)

published: Thu Apr 07 2022 16:51:12 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト