Inv-SENnet: Invariant Self Expression Network for clustering under biased data

Ashutosh Singh; Ashish Singh; Aria Masoomi; Tales Imbiriba; Erik Learned-Miller; Deniz Erdogmus

Inv-SENnet: 偏ったデータの下でのクラスタリングのための不変自己表現ネットワーク

サブスペースクラスタリングアルゴリズムは、データセットをよく説明するクラスター構造を理解するために使用されます。これらの方法は、自然科学のさまざまな分野でデータ探索タスクに広く使用されています。ただし、これらの方法のほとんどは、データセットの不要なバイアスを処理できません。データサンプルが複数の属性を表すデータセットの場合、単純にクラスタリングアプローチを適用すると、望ましくない出力が生じる可能性があります。この目的のために、個々の部分空間でデータポイントをクラスター化することを学習しながら、不要な属性 (バイアス) を共同で削除するための新しいフレームワークを提案します。バイアスに関する情報があると仮定すると、データと不要な属性の間の相互情報を最小限に抑えるように敵対的に学習することにより、クラスタリング方法を正則化します。合成データセットと現実世界のデータセットに関する実験結果は、私たちのアプローチの有効性を示しています。

Subspace clustering algorithms are used for understanding the cluster structure that explains the dataset well. These methods are extensively used for data-exploration tasks in various areas of Natural Sciences. However, most of these methods fail to handle unwanted biases in datasets. For datasets where a data sample represents multiple attributes, naively applying any clustering approach can result in undesired output. To this end, we propose a novel framework for jointly removing unwanted attributes (biases) while learning to cluster data points in individual subspaces. Assuming we have information about the bias, we regularize the clustering method by adversarially learning to minimize the mutual information between the data and the unwanted attributes. Our experimental result on synthetic and real-world datasets demonstrate the effectiveness of our approach.

updated: Sun Nov 13 2022 01:19:06 GMT+0000 (UTC)

published: Sun Nov 13 2022 01:19:06 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト