A Simple Zero-shot Prompt Weighting Technique to Improve Prompt Ensembling in Text-Image Models

James Urquhart Allingham; Jie Ren; Michael W Dusenberry; Xiuye Gu; Yin Cui; Dustin Tran; Jeremiah Zhe Liu; Balaji Lakshminarayanan

テキスト画像モデルのプロンプトアンサンブルを改善するための単純なゼロショットプロンプト重み付け手法

対照的にトレーニングされたテキスト画像モデルには、ゼロショット分類を実行する驚くべき能力があります。つまり、以前に見られなかった画像を、モデルが明示的に識別できるようにトレーニングされたことのないカテゴリに分類します。ただし、これらのゼロショット分類器は、高精度を実現するために迅速なエンジニアリングが必要です。通常、プロンプトエンジニアリングでは、個々のダウンストリームタスク用に一連のプロンプトを手作業で作成する必要があります。この作業では、この迅速なエンジニアリングを自動化し、迅速なアンサンブルを通じてゼロショットの精度を向上させることを目指しています。特に、「大量のプロンプトが与えられた場合、ラベル付けされた検証データへのアクセスを必要とせずに、特定のダウンストリームデータセットに最も適したプロンプトを自動的にスコア付けしてアンサンブルできますか?」と尋ねます。これが可能であることを示します。そうすることで、事前トレーニングとテストデータのバイアスのためにスコアが簡単に過信する可能性がある単純なプロンプトスコアリング方法でいくつかの病状を特定し、バイアスを修正する新しいプロンプトスコアリング方法を提案します。提案されたスコアリング方法を使用して加重平均プロンプトアンサンブルを作成すると、ImageNet、そのバリアントのうちの 4 つ、および 11 のきめの細かい分類ベンチマークで、私たちの方法は均等平均アンサンブルや手作りのプロンプトよりも優れています。最適化フリーで、ラベル付き検証データへのアクセスを必要としません。

Contrastively trained text-image models have the remarkable ability to perform zero-shot classification, that is, classifying previously unseen images into categories that the model has never been explicitly trained to identify. However, these zero-shot classifiers need prompt engineering to achieve high accuracy. Prompt engineering typically requires hand-crafting a set of prompts for individual downstream tasks. In this work, we aim to automate this prompt engineering and improve zero-shot accuracy through prompt ensembling. In particular, we ask "Given a large pool of prompts, can we automatically score the prompts and ensemble those that are most suitable for a particular downstream dataset, without needing access to labeled validation data?". We demonstrate that this is possible. In doing so, we identify several pathologies in a naive prompt scoring method where the score can be easily overconfident due to biases in pre-training and test data, and we propose a novel prompt scoring method that corrects for the biases. Using our proposed scoring method to create a weighted average prompt ensemble, our method outperforms equal average ensemble, as well as hand-crafted prompts, on ImageNet, 4 of its variants, and 11 fine-grained classification benchmarks, all while being fully automatic, optimization-free, and not requiring access to labeled validation data.

updated: Sat Jul 15 2023 11:12:59 GMT+0000 (UTC)

published: Mon Feb 13 2023 10:19:58 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト