Augmentation Strategies for Learning with Noisy Labels

Kento Nishi; Yi Ding; Alex Rich; Tobias Höllerer

ノイズの多いラベルを使用した学習のための拡張戦略

不完全なラベルは、実際のデータセットに遍在しています。ノイズのラベル付けにロバストなディープニューラルネットワーク（DNN）をトレーニングするための最近の成功したいくつかの方法では、2つの主要な手法を使用しています。ウォームアップフェーズ中の損失に基づいてサンプルをフィルタリングし、クリーンにラベル付けされたサンプルの初期セットをキュレートする方法と、ネットワークの出力を使用する方法です。後続の損失計算の疑似ラベルとして。この論文では、「ノイズの多いラベルを使用した学習」問題に取り組むアルゴリズムのさまざまな拡張戦略を評価します。複数の拡張戦略を提案および検討し、CIFAR-10およびCIFAR-100に基づく合成データセット、および実際のデータセットClothing1Mを使用してそれらを評価します。これらのアルゴリズムにはいくつかの共通点があるため、損失モデリングタスクに1セットの拡張を使用し、学習に別のセットを使用することが最も効果的であり、最先端の方法やその他の以前の方法の結果を改善します。さらに、ウォームアップ期間中に拡張を適用すると、正しくラベル付けされたサンプルと誤ってラベル付けされたサンプルの損失収束動作に悪影響を与える可能性があることがわかりました。この拡張戦略を最先端の手法に導入し、評価されたすべてのノイズレベルでパフォーマンスを向上できることを示します。特に、90％の対称ノイズでのCIFAR-10ベンチマークの精度を絶対精度で15％以上向上させ、実際のデータセットであるClothing1Mのパフォーマンスも向上させます。（*同等の貢献）

Imperfect labels are ubiquitous in real-world datasets. Several recent successful methods for training deep neural networks (DNNs) robust to label noise have used two primary techniques: filtering samples based on loss during a warm-up phase to curate an initial set of cleanly labeled samples, and using the output of a network as a pseudo-label for subsequent loss calculations. In this paper, we evaluate different augmentation strategies for algorithms tackling the "learning with noisy labels" problem. We propose and examine multiple augmentation strategies and evaluate them using synthetic datasets based on CIFAR-10 and CIFAR-100, as well as on the real-world dataset Clothing1M. Due to several commonalities in these algorithms, we find that using one set of augmentations for loss modeling tasks and another set for learning is the most effective, improving results on the state-of-the-art and other previous methods. Furthermore, we find that applying augmentation during the warm-up period can negatively impact the loss convergence behavior of correctly versus incorrectly labeled samples. We introduce this augmentation strategy to the state-of-the-art technique and demonstrate that we can improve performance across all evaluated noise levels. In particular, we improve accuracy on the CIFAR-10 benchmark at 90% symmetric noise by more than 15% in absolute accuracy and we also improve performance on the real-world dataset Clothing1M. (* equal contribution)

updated: Wed Mar 03 2021 02:19:35 GMT+0000 (UTC)

published: Wed Mar 03 2021 02:19:35 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト