Learning Representation for Clustering via Prototype Scattering and Positive Sampling

Zhizhong Huang; Jie Chen; Junping Zhang; Hongming Shan

プロトタイプ散乱とポジティブサンプリングによるクラスタリングの学習表現

既存のディープクラスタリング手法は、下流のクラスタリングタスクに対して対照的または非対照的な表現学習に依存しています。負のペアによる対照ベースのメソッドは、クラスタリングの均一な表現を学習しますが、負のペアは必然的にクラスの衝突の問題につながり、その結果、クラスタリングのパフォーマンスが低下する可能性があります。一方、非対照ベースのメソッドは、クラスの衝突の問題を回避しますが、結果として生じる不均一な表現により、クラスタリングが崩壊する可能性があります。両方の世界の強みを享受するために、このペーパーでは、ProPos と呼ばれる、プロトタイプの散乱とポジティブサンプリングを使用した新しいエンドツーエンドのディープクラスタリング手法を紹介します。具体的には、最初に、プロトタイプ散乱損失と呼ばれるプロトタイプ表現間の距離を最大化します。これにより、表現の均一性が向上します。次に、インスタンスの 1 つの拡張ビューを別のビューのサンプリングされた隣接ビュー (埋め込み空間で真に正のペアであると想定) と整列させて、ポジティブサンプリング整列と呼ばれるクラスター内のコンパクトさを改善します。 ProPos の強みは、回避可能なクラス衝突の問題、均一な表現、適切に分離されたクラスター、およびクラスター内のコンパクトさです。エンドツーエンドの期待値最大化フレームワークで ProPos を最適化することにより、ProPos が中規模のクラスタリングベンチマークデータセットで競合するパフォーマンスを達成し、大規模なデータセットで新しい最先端のパフォーマンスを確立することが広範な実験結果によって実証されています。ソースコードは https://github.com/Hzzone/ProPos で入手できます。

Existing deep clustering methods rely on either contrastive or non-contrastive representation learning for downstream clustering task. Contrastive-based methods thanks to negative pairs learn uniform representations for clustering, in which negative pairs, however, may inevitably lead to the class collision issue and consequently compromise the clustering performance. Non-contrastive-based methods, on the other hand, avoid class collision issue, but the resulting non-uniform representations may cause the collapse of clustering. To enjoy the strengths of both worlds, this paper presents a novel end-to-end deep clustering method with prototype scattering and positive sampling, termed ProPos. Specifically, we first maximize the distance between prototypical representations, named prototype scattering loss, which improves the uniformity of representations. Second, we align one augmented view of instance with the sampled neighbors of another view -- assumed to be truly positive pair in the embedding space -- to improve the within-cluster compactness, termed positive sampling alignment. The strengths of ProPos are avoidable class collision issue, uniform representations, well-separated clusters, and within-cluster compactness. By optimizing ProPos in an end-to-end expectation-maximization framework, extensive experimental results demonstrate that ProPos achieves competing performance on moderate-scale clustering benchmark datasets and establishes new state-of-the-art performance on large-scale datasets. Source code is available at https://github.com/Hzzone/ProPos.

updated: Wed Oct 19 2022 03:41:03 GMT+0000 (UTC)

published: Tue Nov 23 2021 12:21:53 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト