CDUL: CLIP-Driven Unsupervised Learning for Multi-Label Image Classification

Rabab Abdelfattah; Qing Guo; Xiaoguang Li; Xiaofeng Wang; Song Wang

CDUL: マルチラベル画像分類のための CLIP 駆動の教師なし学習

この論文では、初期化、トレーニング、推論の 3 つの段階を含む、アノテーション不要のマルチラベル画像分類のための CLIP ベースの教師なし学習方法を紹介します。初期化段階では、強力な CLIP モデルを最大限に活用し、グローバル/ローカルの画像とテキストの類似性集計に基づいてマルチラベル予測用に CLIP を拡張する新しいアプローチを提案します。より具体的には、各画像をスニペットに分割し、CLIP を利用して画像全体 (グローバル) と各スニペット (ローカル) の類似性ベクトルを生成します。次に、グローバルおよびローカルの類似性ベクトルを活用するために類似性アグリゲーターが導入されます。訓練段階での初期擬似ラベルとして集約された類似性スコアを使用して、分類ネットワークのパラメーターを訓練し、観測されていないラベルの擬似ラベルを改良するための最適化フレームワークを提案します。推論中、入力画像のラベルを予測するために分類ネットワークのみが使用されます。広範な実験により、私たちの手法が MS-COCO、PASCAL VOC 2007、PASCAL VOC 2012、NUS データセットに対する最先端の教師なし手法よりも優れたパフォーマンスを示し、弱教師あり分類手法と同等の結果さえ達成できることが示されています。

This paper presents a CLIP-based unsupervised learning method for annotation-free multi-label image classification, including three stages: initialization, training, and inference. At the initialization stage, we take full advantage of the powerful CLIP model and propose a novel approach to extend CLIP for multi-label predictions based on global-local image-text similarity aggregation. To be more specific, we split each image into snippets and leverage CLIP to generate the similarity vector for the whole image (global) as well as each snippet (local). Then a similarity aggregator is introduced to leverage the global and local similarity vectors. Using the aggregated similarity scores as the initial pseudo labels at the training stage, we propose an optimization framework to train the parameters of the classification network and refine pseudo labels for unobserved labels. During inference, only the classification network is used to predict the labels of the input image. Extensive experiments show that our method outperforms state-of-the-art unsupervised methods on MS-COCO, PASCAL VOC 2007, PASCAL VOC 2012, and NUS datasets and even achieves comparable results to weakly supervised classification methods.

updated: Thu Mar 07 2024 05:05:15 GMT+0000 (UTC)

published: Mon Jul 31 2023 13:12:02 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト