CrossSplit: Mitigating Label Noise Memorization through Data Splitting

Jihye Kim; Aristide Baratin; Yan Zhang; Simon Lacoste-Julien

CrossSplit: データ分割によるラベルノイズ記憶の軽減

ラベルノイズの存在下でディープラーニングアルゴリズムのロバスト性を改善するという問題に取り組みます。既存のラベル修正と共同教育方法に基づいて、データセットの 2 つのばらばらの部分でトレーニングされたニューラルネットワークのペアを使用する CrossSplit と呼ばれる、ノイズの多いラベルの記憶を軽減するための新しいトレーニング手順を提案します。 CrossSplit は、次の 2 つの主要な要素を組み合わせたものです。 (i) クロススプリットラベルの修正。データの一部でトレーニングされたモデルは、他の部分からのサンプルとラベルのペアを記憶できないため、各ネットワークに提示されるトレーニングラベルは、ピアネットワークの予測を使用してスムーズに調整できるという考え方です。 (ii) クロススプリットの半教師付きトレーニング。データの一部でトレーニングされたネットワークは、他の部分のラベル付けされていない入力も使用します。 CIFAR-10、CIFAR-100、Tiny-ImageNet、および mini-WebVision データセットに関する広範な実験により、私たちの方法が現在の最先端技術を最大 90% のノイズ比で上回ることができることが実証されています。

We approach the problem of improving robustness of deep learning algorithms in the presence of label noise. Building upon existing label correction and co-teaching methods, we propose a novel training procedure to mitigate the memorization of noisy labels, called CrossSplit, which uses a pair of neural networks trained on two disjoint parts of the dataset. CrossSplit combines two main ingredients: (i) Cross-split label correction. The idea is that, since the model trained on one part of the data cannot memorize example-label pairs from the other part, the training labels presented to each network can be smoothly adjusted by using the predictions of its peer network; (ii) Cross-split semi-supervised training. A network trained on one part of the data also uses the unlabeled inputs of the other part. Extensive experiments on CIFAR-10, CIFAR-100, Tiny-ImageNet and mini-WebVision datasets demonstrate that our method can outperform the current state-of-the-art up to 90% noise ratio.

updated: Sat Dec 03 2022 19:09:56 GMT+0000 (UTC)

published: Sat Dec 03 2022 19:09:56 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト