Exploring the Limits of Deep Image Clustering using Pretrained Models

Nikolas Adaloglou; Felix Michels; Hamza Kalisch; Markus Kollmann

事前トレーニング済みモデルを使用したディープイメージクラスタリングの限界の調査

事前トレーニング済みの特徴抽出器を活用して、ラベルなしで画像を分類することを学習する一般的な方法論を提示します。私たちのアプローチには、事前トレーニング済みの特徴空間内の最近傍が同じラベルを共有する可能性が高いという事実に基づいて、クラスタリングヘッドの自己蒸留トレーニングが含まれます。インスタンスの重み付けと共に点ごとの相互情報のバリアントを導入することにより、画像間の関連付けを学習するという新しい目的を提案します。提案された目的が、事前トレーニング済みの特徴空間の構造を効率的に活用しながら、偽陽性ペアの影響を減衰できることを示します。その結果、17 の異なる事前トレーニング済みモデルの k-means に対するクラスタリング精度が、ImageNet と CIFAR100 でそれぞれ 6.1% と 12.2% 向上しました。最後に、自己教師ありの事前トレーニング済みビジョントランスフォーマーを使用して、ImageNet のクラスタリング精度を 61.6% に押し上げました。コードはオープンソースになります。

We present a general methodology that learns to classify images without labels by leveraging pretrained feature extractors. Our approach involves self-distillation training of clustering heads, based on the fact that nearest neighbors in the pretrained feature space are likely to share the same label. We propose a novel objective to learn associations between images by introducing a variant of pointwise mutual information together with instance weighting. We demonstrate that the proposed objective is able to attenuate the effect of false positive pairs while efficiently exploiting the structure in the pretrained feature space. As a result, we improve the clustering accuracy over k-means on 17 different pretrained models by 6.1% and 12.2% on ImageNet and CIFAR100, respectively. Finally, using self-supervised pretrained vision transformers we push the clustering accuracy on ImageNet to 61.6%. The code will be open-sourced.

updated: Fri Mar 31 2023 08:56:29 GMT+0000 (UTC)

published: Fri Mar 31 2023 08:56:29 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト