CLC: Cluster Assignment via Contrastive Representation Learning

Fei Ding; Dan Zhang; Yin Yang; Venkat Krovi; Feng Luo

CLC: 対照表現学習によるクラスターの割り当て

クラスタリングは、手動による注釈を付けずにサンプルをクラスターにグループ化するという重要かつ困難なタスクのままです。最近の研究では、自己教師あり学習から学習した特徴表現に対してクラスタリングを実行することで、小規模なデータセットで優れた結果を達成しました。ただし、ImageNet などの多数のクラスターを含むデータセットの場合、現在の方法では依然として高いクラスタリングパフォーマンスを達成できません。この論文では、対照学習を使用してクラスターの割り当てを直接学習する対照学習ベースのクラスタリング (CLC) を提案します。表現を 2 つの部分に分解します。1 つは等分割制約の下でカテゴリ情報をエンコードし、もう 1 つはインスタンスごとの因子を取得します。表現の両方の部分を使用して対照的な損失を提案します。提案されたコントラスト損失を理論的に分析し、CLC がクラスターの割り当てを学習する際に陰性サンプルに異なる重みを設定することを明らかにします。さらに勾配分析を行うと、重みが大きいほどハードネガティブサンプルに集中する傾向があることがわかります。したがって、提案された損失は、クラスターの割り当てを効率的に学習できる高い表現力を備えています。実験による評価では、CLC が複数のベンチマークデータセット上で全体的に最先端または非常に競争力のあるクラスタリングパフォーマンスを達成していることが示されています。特に、完全な ImageNet データセットで 53.4% の精度を達成し、既存の手法を大幅に上回りました (+ 10.2%)。

Clustering remains an important and challenging task of grouping samples into clusters without manual annotations. Recent works have achieved excellent results on small datasets by performing clustering on feature representations learned from self-supervised learning. However, for datasets with a large number of clusters, such as ImageNet, current methods still can not achieve high clustering performance. In this paper, we propose Contrastive Learning-based Clustering (CLC), which uses contrastive learning to directly learn cluster assignment. We decompose the representation into two parts: one encodes the categorical information under an equipartition constraint, and the other captures the instance-wise factors. We propose a contrastive loss using both parts of the representation. We theoretically analyze the proposed contrastive loss and reveal that CLC sets different weights for the negative samples while learning cluster assignments. Further gradient analysis shows that the larger weights tend to focus more on the hard negative samples. Therefore, the proposed loss has high expressiveness that enables us to efficiently learn cluster assignments. Experimental evaluation shows that CLC achieves overall state-of-the-art or highly competitive clustering performance on multiple benchmark datasets. In particular, we achieve 53.4% accuracy on the full ImageNet dataset and outperform existing methods by large margins (+ 10.2%).

updated: Thu Jun 08 2023 07:15:13 GMT+0000 (UTC)

published: Thu Jun 08 2023 07:15:13 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト