DiscrimLoss: A Universal Loss for Hard Samples and Incorrect Samples Discrimination

Tingting Wu; Xiao Ding; Hao Zhang; Jinglong Gao; Li Du; Bing Qin; Ting Liu

DiscrimLoss: ハードサンプルと不適切なサンプルの識別のための普遍的な損失

ラベルノイズのあるデータ (つまり、正しくないデータ) が与えられると、ディープニューラルネットワークは徐々にラベルノイズを記憶し、モデルのパフォーマンスを低下させます。この問題を軽減するために、トレーニングサンプルを意味のある (たとえば、簡単なものから難しいものへ) 順番に並べることによって、モデルのパフォーマンスと一般化を改善するカリキュラム学習が提案されています。以前の作業では、ハードサンプル (つまり、正しいデータ内のハードサンプル) と正しくないサンプルを区別せずに、正しくないサンプルを一般的なハードサンプルと見なしていました。実際、モデルはハードサンプルから学習して、不適切なサンプルにオーバーフィットするのではなく、一般化を促進する必要があります。この論文では、既存のタスク損失の上に新しい損失関数 DiscrimLoss を追加することで、この問題に対処します。その主な効果は、モデルのパフォーマンスを向上させるために、トレーニングの初期段階で簡単なサンプルと難しいサンプル (難しいサンプルや正しくないサンプルを含む) の重要性を自動的かつ安定的に推定することです。その後、次の段階で、DiscrimLoss は、モデルの一般化を改善するためにハードサンプルと不適切なサンプルを区別することに専念します。このようなトレーニング戦略は、カリキュラム学習の主な原則を効果的に模倣して、自己管理型の方法で動的に策定できます。画像分類、画像回帰、テキストシーケンス回帰、およびイベント関係の推論に関する実験は、特に多様なノイズレベルが存在する場合に、この方法の汎用性と有効性を示しています。

Given data with label noise (i.e., incorrect data), deep neural networks would gradually memorize the label noise and impair model performance. To relieve this issue, curriculum learning is proposed to improve model performance and generalization by ordering training samples in a meaningful (e.g., easy to hard) sequence. Previous work takes incorrect samples as generic hard ones without discriminating between hard samples (i.e., hard samples in correct data) and incorrect samples. Indeed, a model should learn from hard samples to promote generalization rather than overfit to incorrect ones. In this paper, we address this problem by appending a novel loss function DiscrimLoss, on top of the existing task loss. Its main effect is to automatically and stably estimate the importance of easy samples and difficult samples (including hard and incorrect samples) at the early stages of training to improve the model performance. Then, during the following stages, DiscrimLoss is dedicated to discriminating between hard and incorrect samples to improve the model generalization. Such a training strategy can be formulated dynamically in a self-supervised manner, effectively mimicking the main principle of curriculum learning. Experiments on image classification, image regression, text sequence regression, and event relation reasoning demonstrate the versatility and effectiveness of our method, particularly in the presence of diversified noise levels.

updated: Sun Aug 21 2022 13:38:55 GMT+0000 (UTC)

published: Sun Aug 21 2022 13:38:55 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト