K-Splits: Improved K-Means Clustering Algorithm to Automatically Detect the Number of Clusters

Seyed Omid Mohammadi; Ahmad Kalhor; Hossein Bodaghi

K-Splits：クラスターの数を自動的に検出するための改良されたK-Meansクラスタリングアルゴリズム

このホワイトペーパーでは、クラスター数を事前に知らなくてもデータをクラスター化するためのk-meansに基づく改良された階層アルゴリズムであるk-splitsを紹介します。 K分割は、少数のクラスターから開始し、最も重要なデータ分布軸を使用して、必要に応じてこれらのクラスターを段階的に分割し、より適切なものにします。精度と速度は、提案された方法の2つの主な利点です。 6つの合成ベンチマークデータセットと2つの実世界データセットMNISTおよびFashion-MNISTで実験し、さまざまな条件下で正しい数のクラスターを見つける際にアルゴリズムが優れた精度を持っていることを証明します。また、k-splitは同様の方法よりも高速であり、低次元では標準のk-meansよりも高速である可能性があることも示しています。最後に、k-splitを使用して重心の正確な位置を明らかにし、それらをk-meansアルゴリズムへの初期点として入力して、結果を微調整することをお勧めします。

This paper introduces k-splits, an improved hierarchical algorithm based on k-means to cluster data without prior knowledge of the number of clusters. K-splits starts from a small number of clusters and uses the most significant data distribution axis to split these clusters incrementally into better fits if needed. Accuracy and speed are two main advantages of the proposed method. We experiment on six synthetic benchmark datasets plus two real-world datasets MNIST and Fashion-MNIST, to prove that our algorithm has excellent accuracy in finding the correct number of clusters under different conditions. We also show that k-splits is faster than similar methods and can even be faster than the standard k-means in lower dimensions. Finally, we suggest using k-splits to uncover the exact position of centroids and then input them as initial points to the k-means algorithm to fine-tune the results.

updated: Tue May 24 2022 05:40:04 GMT+0000 (UTC)

published: Sat Oct 09 2021 23:02:57 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト