Subspace Clustering using Ensembles of K-Subspaces

John Lipor; David Hong; Yan Shuo Tan; Laura Balzano

K部分空間のアンサンブルを使用した部分空間クラスタリング

部分空間クラスタリングは、低次元の線形部分空間の和集合の近くにある点の教師なしグループです。このようなデータの幾何学的特性に直接基づくアルゴリズムは、経験的パフォーマンスが低いか、理論的な保証がないか、初期化に大きく依存する傾向があります。証拠蓄積クラスタリングフレームワークを介してK部分空間（KSS）アルゴリズムのアンサンブルを活用する部分空間クラスタリング問題への新しい幾何学的アプローチを提示します。アンサンブルK部分空間（EKSS）と呼ばれるこのアルゴリズムは、（i、j）番目のエントリがランダムな初期化を伴うKSSの複数の実行によってポイントiとjがクラスター化される回数である共連想行列を形成します。ペアワイズ絶対内積の単調変換に近いエントリを持つ親和性行列を形成するアルゴリズムの一般的な回復保証を証明します。次に、EKSSの特定のインスタンスがこの形式のエントリを持つ親和性行列を生成することを示します。したがって、提案されたアルゴリズムは、最先端のアルゴリズムと同様の条件下で部分空間を確実に回復できます。この発見は、私たちの知る限り、証拠蓄積クラスタリングとKSSバリアントの最初の回復保証です。合成データで、交差が大きい部分空間、主角度が小さい部分空間、およびノイズの多いデータの従来の困難な設定で、この方法がうまく機能することを示します。最後に、6つの一般的なベンチマークデータセットでアルゴリズムを評価し、既存の方法とは異なり、部分空間あたりのポイント数が少ない場合と多い場合に、EKSSが優れた経験的パフォーマンスを達成することを示します。

Subspace clustering is the unsupervised grouping of points lying near a union of low-dimensional linear subspaces. Algorithms based directly on geometric properties of such data tend to either provide poor empirical performance, lack theoretical guarantees, or depend heavily on their initialization. We present a novel geometric approach to the subspace clustering problem that leverages ensembles of the K-subspaces (KSS) algorithm via the evidence accumulation clustering framework. Our algorithm, referred to as ensemble K-subspaces (EKSS), forms a co-association matrix whose (i,j)th entry is the number of times points i and j are clustered together by several runs of KSS with random initializations. We prove general recovery guarantees for any algorithm that forms an affinity matrix with entries close to a monotonic transformation of pairwise absolute inner products. We then show that a specific instance of EKSS results in an affinity matrix with entries of this form, and hence our proposed algorithm can provably recover subspaces under similar conditions to state-of-the-art algorithms. The finding is, to the best of our knowledge, the first recovery guarantee for evidence accumulation clustering and for KSS variants. We show on synthetic data that our method performs well in the traditionally challenging settings of subspaces with large intersection, subspaces with small principal angles, and noisy data. Finally, we evaluate our algorithm on six common benchmark datasets and show that unlike existing methods, EKSS achieves excellent empirical performance when there are both a small and large number of points per subspace.

updated: Wed Jan 06 2021 23:39:59 GMT+0000 (UTC)

published: Thu Sep 14 2017 12:55:56 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト