InPL: Pseudo-labeling the Inliers First for Imbalanced Semi-supervised Learning

Zhuoran Yu; Yin Li; Yong Jae Lee

InPL: 不均衡な半教師あり学習のための最初のインライアの疑似ラベル付け

不均衡な半教師あり学習 (SSL) における最近の最先端の方法は、一貫性の正則化による信頼ベースの疑似ラベル付けに依存しています。高品質の疑似ラベルを取得するには、通常、高い信頼度のしきい値が採用されます。ただし、ディープネットワークでのソフトマックスベースの信頼スコアは、トレーニングデータから離れたサンプルでは任意に高くなる可能性があることが示されています。そのため、信頼性の高いラベルのないサンプルでさえ、疑似ラベルは依然として信頼できない可能性があります。この作業では、不均衡な SSL の疑似ラベル付けの新しい視点を提示します。モデルの信頼性に頼らずに、ラベルのないサンプルが「配布されている」可能性が高いかどうかを測定することを提案します。つまり、現在のトレーニングデータに近いです。ラベルのないサンプルが「分布内」か「分布外」かを判断するために、分布外検出文献からのエネルギースコアを採用します。トレーニングが進行し、より多くのラベル付けされていないサンプルが分布してトレーニングに寄与するにつれて、ラベル付けされたデータと疑似ラベル付けされたデータを組み合わせることで、モデルを改善するために真のクラス分布をより適切に近似できます。実験では、エネルギーベースの疑似ラベリング手法である InPL は、概念的には単純ですが、不均衡な SSL ベンチマークで信頼度ベースの手法よりも大幅に優れていることが示されています。たとえば、CIFAR10-LT では絶対精度が約 3% 向上します。最先端のロングテール SSL 方式と組み合わせると、さらに改善されます。特に、最も困難なシナリオの 1 つで、InPL は最高の競合他社よりも 6.9% の精度向上を達成しています。

Recent state-of-the-art methods in imbalanced semi-supervised learning (SSL) rely on confidence-based pseudo-labeling with consistency regularization. To obtain high-quality pseudo-labels, a high confidence threshold is typically adopted. However, it has been shown that softmax-based confidence scores in deep networks can be arbitrarily high for samples far from the training data, and thus, the pseudo-labels for even high-confidence unlabeled samples may still be unreliable. In this work, we present a new perspective of pseudo-labeling for imbalanced SSL. Without relying on model confidence, we propose to measure whether an unlabeled sample is likely to be ``in-distribution''; i.e., close to the current training data. To decide whether an unlabeled sample is ``in-distribution'' or ``out-of-distribution'', we adopt the energy score from out-of-distribution detection literature. As training progresses and more unlabeled samples become in-distribution and contribute to training, the combined labeled and pseudo-labeled data can better approximate the true class distribution to improve the model. Experiments demonstrate that our energy-based pseudo-labeling method, InPL, albeit conceptually simple, significantly outperforms confidence-based methods on imbalanced SSL benchmarks. For example, it produces around 3% absolute accuracy improvement on CIFAR10-LT. When combined with state-of-the-art long-tailed SSL methods, further improvements are attained. In particular, in one of the most challenging scenarios, InPL achieves a 6.9% accuracy improvement over the best competitor.

updated: Mon Mar 13 2023 16:45:41 GMT+0000 (UTC)

published: Mon Mar 13 2023 16:45:41 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト