LaFTer: Label-Free Tuning of Zero-shot Classifier using Language and Unlabeled Image Collections

M. Jehanzeb Mirza; Leonid Karlinsky; Wei Lin; Mateusz Kozinski; Horst Possegger; Rogerio Feris; Horst Bischof

LaFTer: 言語とラベルのない画像コレクションを使用したゼロショット分類器のラベルフリーチューニング

最近、大規模な事前トレーニングされた視覚と言語 (VL) モデルにより、ゼロショット視覚分類における新しい最先端 (SOTA) が確立され、単純言語として定義される潜在的に無制限のカテゴリのセットのオープン語彙認識が可能になりました。プロンプトを表示します。ただし、これらの大きな進歩にも関わらず、これらのゼロショット分類器のパフォーマンスは、教師付き微調整でトレーニングされた専用 (クローズドカテゴリセット) 分類器の結果には依然として及ばない。この論文では、ラベルなしの画像コレクションとカテゴリを記述する大規模言語モデル (LLM) を使用して自動生成されたテキストのセットを使用して、ラベルやペアの VL データを使用せずにこのギャップを減らす方法を初めて示します。興味を引くものであり、これらのカテゴリのラベル付き視覚的インスタンスを効果的に置き換えることができます。ラベルフリーのアプローチを使用することで、ベース VL モデルやその他の最新の手法やベースラインのゼロショットパフォーマンスと比較して、さまざまなデータセットで大幅なパフォーマンスの向上を達成することができ、最大 11.7% (3.8%) の絶対的な向上を実証しています。平均）ラベルなし設定で。さらに、私たちのアプローチはラベルフリーであるにもかかわらず、5 ショット監視を使用する主要な数ショットプロンプトベースラインと比較して、平均 1.3% の向上が観察されています。

Recently, large-scale pre-trained Vision and Language (VL) models have set a new state-of-the-art (SOTA) in zero-shot visual classification enabling open-vocabulary recognition of potentially unlimited set of categories defined as simple language prompts. However, despite these great advances, the performance of these zeroshot classifiers still falls short of the results of dedicated (closed category set) classifiers trained with supervised fine tuning. In this paper we show, for the first time, how to reduce this gap without any labels and without any paired VL data, using an unlabeled image collection and a set of texts auto-generated using a Large Language Model (LLM) describing the categories of interest and effectively substituting labeled visual instances of those categories. Using our label-free approach, we are able to attain significant performance improvements over the zero-shot performance of the base VL model and other contemporary methods and baselines on a wide variety of datasets, demonstrating absolute improvement of up to 11.7% (3.8% on average) in the label-free setting. Moreover, despite our approach being label-free, we observe 1.3% average gains over leading few-shot prompting baselines that do use 5-shot supervision.

updated: Mon May 29 2023 17:56:35 GMT+0000 (UTC)

published: Mon May 29 2023 17:56:35 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト