Offline Clustering Approach to Self-supervised Learning for Class-imbalanced Image Data

Hye-min Chang; Sungkyun Chang

クラス不均衡画像データの自己教師あり学習へのオフラインクラスタリングアプローチ

クラスの不均衡なデータセットは、モデルが多数派クラスに偏るという問題を引き起こすことが知られています。このプロジェクトでは、2 つの研究課題を設定しました。 2) 特徴表現のオフラインクラスタリングは、クラスの不均衡なデータの事前トレーニングに役立ちますか?私たちの実験では、ベースラインモデル、つまり CIFAR-10 データベースの SimCLR と SimSiam をトレーニングするときに、クラスの不均衡の程度を調整することによって、前者の問題を調査します。後者の質問に答えるために、特徴クラスターの各サブセットで各エキスパートモデルをトレーニングします。次に、エキスパートモデルの知識を 1 つのモデルに抽出して、このモデルのパフォーマンスをベースラインと比較できるようにします。

Class-imbalanced datasets are known to cause the problem of model being biased towards the majority classes. In this project, we set up two research questions: 1) when is the class-imbalance problem more prevalent in self-supervised pre-training? and 2) can offline clustering of feature representations help pre-training on class-imbalanced data? Our experiments investigate the former question by adjusting the degree of class-imbalance when training the baseline models, namely SimCLR and SimSiam on CIFAR-10 database. To answer the latter question, we train each expert model on each subset of the feature clusters. We then distill the knowledge of expert models into a single model, so that we will be able to compare the performance of this model to our baselines.

updated: Thu Dec 22 2022 01:26:38 GMT+0000 (UTC)

published: Thu Dec 22 2022 01:26:38 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト