Information Maximization Clustering via Multi-View Self-Labelling

Foivos Ntelemis; Yaochu Jin; Spencer A. Thomas

マルチビュー自己ラベリングによる情報最大化クラスタリング

画像クラスタリングは、人間の監督なしで注釈を生成することを目的とした、特に困難なコンピュータービジョンタスクです。最近の進歩は、最初に価値のあるセマンティクスを学習し、次に画像表現をクラスタリングすることにより、画像クラスタリングにおける自己教師あり学習戦略の使用に焦点を合わせています。ただし、これらの多相アルゴリズムは計算時間を増加させ、最終的なパフォーマンスは最初の段階に依存します。自己教師ありアプローチを拡張することにより、意味のある表現を同時に学習し、対応する注釈を割り当てる、新しい単相クラスタリング手法を提案します。これは、離散表現を分類器ネットを介して自己教師ありパラダイムに統合することによって実現されます。具体的には、提案されたクラスタリング目標は相互情報量を採用し、統合された離散表現と離散確率分布の間の依存関係を最大化します。離散確率分布は、学習した潜在表現を一連のトレーニング可能なプロトタイプと比較することにより、自己教師ありプロセスを通じて導出されます。分類器の学習パフォーマンスを向上させるために、マルチクロップビュー全体で相互情報量を共同で適用します。私たちの経験的結果は、提案されたフレームワークが、CIFAR-10およびCIFAR-100 / 20データセットでそれぞれ89.1％および49.0％の平均精度で最先端の技術を上回っていることを示しています。最後に、提案された方法は、パラメータ設定に対する魅力的な堅牢性も示しており、他のデータセットに適用できるようになっています。

Image clustering is a particularly challenging computer vision task, which aims to generate annotations without human supervision. Recent advances focus on the use of self-supervised learning strategies in image clustering, by first learning valuable semantics and then clustering the image representations. These multiple-phase algorithms, however, increase the computational time and their final performance is reliant on the first stage. By extending the self-supervised approach, we propose a novel single-phase clustering method that simultaneously learns meaningful representations and assigns the corresponding annotations. This is achieved by integrating a discrete representation into the self-supervised paradigm through a classifier net. Specifically, the proposed clustering objective employs mutual information, and maximizes the dependency between the integrated discrete representation and a discrete probability distribution. The discrete probability distribution is derived though the self-supervised process by comparing the learnt latent representation with a set of trainable prototypes. To enhance the learning performance of the classifier, we jointly apply the mutual information across multi-crop views. Our empirical results show that the proposed framework outperforms state-of-the-art techniques with the average accuracy of 89.1% and 49.0%, respectively, on CIFAR-10 and CIFAR-100/20 datasets. Finally, the proposed method also demonstrates attractive robustness to parameter settings, making it ready to be applicable to other datasets.

updated: Mon Oct 18 2021 21:14:16 GMT+0000 (UTC)

published: Fri Mar 12 2021 16:04:41 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト