Identifying Hard Noise in Long-Tailed Sample Distribution

Xuanyu Yi; Kaihua Tang; Xian-Sheng Hua; Joo-Hwee Lim; Hanwang Zhang

ロングテールサンプル分布におけるハードノイズの特定

従来のノイズ除去方法は、すべてのサンプルが独立しており、同じように分布しているという仮定に依存しているため、結果の分類器は、ノイズに邪魔されても、トレーニング分布の外れ値としてノイズを簡単に識別できます。ただし、必然的にロングテールの大規模データでは、この仮定は非現実的です。このような不均衡なトレーニングデータにより、以前は「簡単な」ノイズが「難しい」ノイズに変わったテールクラスの分類器の識別が少なくなります。これらは、クリーンなテールサンプルとほぼ同じくらい外れ値です。この新しい課題をNoisyLong-TailedClassification（NLT）として紹介します。当然のことながら、ほとんどのノイズ除去方法ではハードノイズを識別できず、3つの提案されたNLTベンチマークであるImageNet-NLT、Animal10-NLT、およびFood101-NLTのパフォーマンスが大幅に低下します。この目的のために、Hard-to-Easy（H2E）と呼ばれる反復的なノイズの多い学習フレームワークを設計します。私たちのブートストラップ哲学は、最初にクラスとコンテキスト分布の変化に不変のノイズ識別子として分類器を学習し、「ハード」ノイズを「イージー」ノイズに減らし、その除去によって不変性がさらに改善されることです。実験結果は、当社のH2Eが、従来のバランスの取れた設定で安定したパフォーマンスを維持しながら、最先端のノイズ除去方法とロングテール設定でのアブレーションよりも優れていることを示しています。データセットとコードはhttps://github.com/yxymessi/H2E-Frameworkで入手できます。

Conventional de-noising methods rely on the assumption that all samples are independent and identically distributed, so the resultant classifier, though disturbed by noise, can still easily identify the noises as the outliers of training distribution. However, the assumption is unrealistic in large-scale data that is inevitably long-tailed. Such imbalanced training data makes a classifier less discriminative for the tail classes, whose previously "easy" noises are now turned into "hard" ones -- they are almost as outliers as the clean tail samples. We introduce this new challenge as Noisy Long-Tailed Classification (NLT). Not surprisingly, we find that most de-noising methods fail to identify the hard noises, resulting in significant performance drop on the three proposed NLT benchmarks: ImageNet-NLT, Animal10-NLT, and Food101-NLT. To this end, we design an iterative noisy learning framework called Hard-to-Easy (H2E). Our bootstrapping philosophy is to first learn a classifier as noise identifier invariant to the class and context distributional changes, reducing "hard" noises to "easy" ones, whose removal further improves the invariance. Experimental results show that our H2E outperforms state-of-the-art de-noising methods and their ablations on long-tailed settings while maintaining a stable performance on the conventional balanced settings. Datasets and codes are available at https://github.com/yxymessi/H2E-Framework

updated: Fri Mar 31 2023 07:03:13 GMT+0000 (UTC)

published: Wed Jul 27 2022 09:03:03 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト