An Embarrassingly Simple Baseline for Imbalanced Semi-Supervised Learning

Hao Chen; Yue Fan; Yidong Wang; Jindong Wang; Bernt Schiele; Xing Xie; Marios Savvides; Bhiksha Raj

不均衡な半教師あり学習のための非常に単純なベースライン

半教師あり学習 (SSL) は、ラベルのないデータを活用してモデルのパフォーマンスを向上させる上で大きな期待を寄せています。標準 SSL は均一なデータ分散を前提としていますが、ラベル付きデータとラベルなしデータの両方で不均衡なクラス分散が発生する、不均衡な SSL と呼ばれるより現実的で困難な設定を検討します。この課題に取り組むための既存の取り組みがありますが、クラスの不均衡を十分かつ効果的に減らすことができないため、深刻な不均衡に直面するとパフォーマンスが低下します。この論文では、最も頻繁なクラスからのクラス分布の違いに従って、ラベル付けされたデータを疑似ラベルで単純に補足することによってデータの不均衡に対処する単純でありながら見過ごされているベースライン、SimiS を研究します。このような単純なベースラインは、クラスの不均衡を減らすのに非常に効果的であることが判明しました。 CIFAR100-LT、FOOD101-LT、および ImageNet127 で、以前の SOTA よりも大幅に優れています。不均衡が減少すると、収束が速くなり、SimiS の疑似ラベル精度が向上します。私たちの方法の単純さは、パフォーマンスをさらに改善するために他のリバランス技術と組み合わせることも可能にします.さらに、私たちの方法は幅広いデータ分布に対して優れたロバスト性を示しており、実際には非常に大きな可能性を秘めています。コードは公開されます。

Semi-supervised learning (SSL) has shown great promise in leveraging unlabeled data to improve model performance. While standard SSL assumes uniform data distribution, we consider a more realistic and challenging setting called imbalanced SSL, where imbalanced class distributions occur in both labeled and unlabeled data. Although there are existing endeavors to tackle this challenge, their performance degenerates when facing severe imbalance since they can not reduce the class imbalance sufficiently and effectively. In this paper, we study a simple yet overlooked baseline -- SimiS -- which tackles data imbalance by simply supplementing labeled data with pseudo-labels, according to the difference in class distribution from the most frequent class. Such a simple baseline turns out to be highly effective in reducing class imbalance. It outperforms existing methods by a significant margin, e.g., 12.8%, 13.6%, and 16.7% over previous SOTA on CIFAR100-LT, FOOD101-LT, and ImageNet127 respectively. The reduced imbalance results in faster convergence and better pseudo-label accuracy of SimiS. The simplicity of our method also makes it possible to be combined with other re-balancing techniques to improve the performance further. Moreover, our method shows great robustness to a wide range of data distributions, which holds enormous potential in practice. Code will be publicly available.

updated: Sun Nov 20 2022 21:18:41 GMT+0000 (UTC)

published: Sun Nov 20 2022 21:18:41 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト