Consensual Collaborative Training And Knowledge Distillation Based Facial Expression Recognition Under Noisy Annotations

Darshan Gera; S. Balasubramanian

ノイズの多い注釈の下での合意に基づく共同トレーニングと知識蒸留に基づく顔の表情の認識

大規模な顔の表情データセットのラベルにノイズが存在することは、野生の顔の表情認識（FER）に向けた重要な課題です。学習の初期段階では、ディープネットワークはクリーンなデータに適合します。その後、最終的には、記憶能力のためにノイズの多いラベルに過剰適合し始め、FERのパフォーマンスが制限されます。この作業は、コンセンサスコラボレーティブトレーニング（CCT）フレームワークと呼ばれる、ノイズの多いラベルが存在する場合の効果的なトレーニング戦略を提案します。 CCTは、ノイズ分布について何も仮定せずに、監視損失と整合性損失の凸結合を使用して3つのネットワークを共同でトレーニングします。動的遷移メカニズムは、早期学習における監視の喪失から、後の段階でのネットワーク間の予測のコンセンサスのための一貫性の喪失に移行するために使用されます。推論は、単純な知識蒸留スキームに基づく単一のネットワークを使用して行われます。提案されたフレームワークの有効性は、合成および実際のノイズの多いFERデータセットで実証されています。さらに、約5Kの画像の大規模なテストサブセットが、16の異なるアノテーターの群衆の知恵を使用して、FECデータセットから注釈が付けられ、信頼できるラベルが推測されます。 CCTも検証されています。最先端のパフォーマンスは、ベンチマークFERデータセットRAFDB（90.84％）FERPlus（89.99％）およびAffectNet（66％）で報告されています。コードはhttps://github.com/1980x/CCTで入手できます。

Presence of noise in the labels of large scale facial expression datasets has been a key challenge towards Facial Expression Recognition (FER) in the wild. During early learning stage, deep networks fit on clean data. Then, eventually, they start overfitting on noisy labels due to their memorization ability, which limits FER performance. This work proposes an effective training strategy in the presence of noisy labels, called as Consensual Collaborative Training (CCT) framework. CCT co-trains three networks jointly using a convex combination of supervision loss and consistency loss, without making any assumption about the noise distribution. A dynamic transition mechanism is used to move from supervision loss in early learning to consistency loss for consensus of predictions among networks in the later stage. Inference is done using a single network based on a simple knowledge distillation scheme. Effectiveness of the proposed framework is demonstrated on synthetic as well as real noisy FER datasets. In addition, a large test subset of around 5K images is annotated from the FEC dataset using crowd wisdom of 16 different annotators and reliable labels are inferred. CCT is also validated on it. State-of-the-art performance is reported on the benchmark FER datasets RAFDB (90.84%) FERPlus (89.99%) and AffectNet (66%). Our codes are available at https://github.com/1980x/CCT.

updated: Sat Jul 10 2021 03:37:06 GMT+0000 (UTC)

published: Sat Jul 10 2021 03:37:06 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト