Leveraging Multiple Descriptive Features for Robust Few-shot Image Learning

Zhili Feng; Anna Bair; J. Zico Kolter

複数の記述特徴を活用して堅牢な少数ショット画像学習を実現

最新の画像分類は、大規模な識別ネットワークを介してモデルクラスを直接予測することに基づいているため、分類の決定を構成する可能性のある直感的な視覚的な「特徴」を評価することが困難になっています。同時に、CLIP などの共同視覚言語モデルの最近の研究では、画像クラスの自然言語記述を指定する方法が提供されていますが、通常は各クラスに単一の記述を提供することに重点が置かれています。この研究では、クラスごとの複数の「視覚的特徴」の理解におそらくより近い代替アプローチでも、堅牢な数回の学習設定で説得力のあるパフォーマンスを提供できることを実証します。特に、大規模言語モデル (LLM) を介して各クラスの複数の視覚的説明を自動的に列挙し、視覚画像モデルを使用してこれらの説明を各画像の複数の視覚的特徴のセットに変換します。最後に、スパースロジスティック回帰を使用して、これらの特徴の関連するサブセットを選択し、各画像を分類します。これにより、各クラスに関連する機能の「直感的な」セットが提供され、数ショット学習設定では、線形プローブなどの標準的なアプローチよりも優れたパフォーマンスを発揮します。また、微調整と組み合わせると、この方法がディストリビューション内とディストリビューション外の両方のパフォーマンスにおいて、既存の最先端の微調整アプローチを上回るパフォーマンスを発揮できることも示しました。

Modern image classification is based upon directly predicting model classes via large discriminative networks, making it difficult to assess the intuitive visual ``features'' that may constitute a classification decision. At the same time, recent works in joint visual language models such as CLIP provide ways to specify natural language descriptions of image classes but typically focus on providing single descriptions for each class. In this work, we demonstrate that an alternative approach, arguably more akin to our understanding of multiple ``visual features'' per class, can also provide compelling performance in the robust few-shot learning setting. In particular, we automatically enumerate multiple visual descriptions of each class -- via a large language model (LLM) -- then use a vision-image model to translate these descriptions to a set of multiple visual features of each image; we finally use sparse logistic regression to select a relevant subset of these features to classify each image. This both provides an ``intuitive'' set of relevant features for each class, and in the few-shot learning setting, outperforms standard approaches such as linear probing. When combined with finetuning, we also show that the method is able to outperform existing state-of-the-art finetuning approaches on both in-distribution and out-of-distribution performance.

updated: Mon Jul 10 2023 03:06:45 GMT+0000 (UTC)

published: Mon Jul 10 2023 03:06:45 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト