Separating Boundary Points via Structural Regularization for Very Compact Clusters

Xin Ma; Won Hwa Kim

非常にコンパクトなクラスターの構造正則化による境界点の分離

クラスタリングアルゴリズムは、データの効果的な表現を提供するディープニューラルネットワークとともに大幅に改善されました。既存の方法は、サンプルのクラスター割り当ての分散を活用するディープオートエンコーダーとセルフトレーニングプロセスに基づいて構築されています。ただし、オートエンコーダの基本的な目的は効率的なデータ再構築に焦点を合わせているため、学習された空間はクラスタリングには最適ではない可能性があります。さらに、データの非常に効果的なコード（つまり、表現）が必要です。そうしないと、初期のクラスター中心がセルフトレーニング中に安定性の問題を引き起こすことがよくあります。多くの最先端のクラスタリングアルゴリズムは、畳み込み演算を使用して効率的なコードを抽出しますが、それらのアプリケーションは画像データに限定されています。この点で、エンドツーエンドのディープクラスタリングアルゴリズム、つまり、Very Compact Clusters（VCC）を提案します。 VCCは、クラスターの境界近くのサンプルの局所的な関係の分布を利用するため、サンプルを適切に分離してクラスターの中心に引き寄せ、コンパクトなクラスターを形成できます。さまざまなデータセットでの実験結果は、提案されたアプローチが、画像データと非画像データの両方について、ほとんどの最先端のクラスタリング手法に対して競争力のあるクラスタリングパフォーマンスを達成し、その結果を学習した低次元で簡単に定性的に確認できることを示しています。スペース。

Clustering algorithms have significantly improved along with Deep Neural Networks which provide effective representation of data. Existing methods are built upon deep autoencoder and self-training process that leverages the distribution of cluster assignments of samples. However, as the fundamental objective of the autoencoder is focused on efficient data reconstruction, the learnt space may be sub-optimal for clustering. Moreover, it requires highly effective codes (i.e., representation) of data, otherwise the initial cluster centers often cause stability issues during self-training. Many state-of-the-art clustering algorithms use convolution operation to extract efficient codes but their applications are limited to image data. In this regard, we propose an end-to-end deep clustering algorithm, i.e., Very Compact Clusters (VCC). VCC takes advantage of distributions of local relationships of samples near the boundary of clusters, so that they can be properly separated and pulled to cluster centers to form compact clusters. Experimental results on various datasets illustrate that our proposed approach achieves competitive clustering performance against most of the state-of-the-art clustering methods for both image and non-image data, and its results can be easily qualitatively seen in the learnt low-dimensional space.

updated: Thu Sep 16 2021 03:30:43 GMT+0000 (UTC)

published: Wed Jun 09 2021 23:22:03 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト