Affect Expression Behaviour Analysis in the Wild using Consensual Collaborative Training

Darshan Gera; S Balasubramanian

コンセンサスコラボレーティブトレーニングを使用した野生の発現行動分析に影響を与える

野生の顔の表情認識（FER）は、信頼性の高い人間とコンピューターの対話型システムを構築するために重要です。ただし、FERでの大規模なデータセットの注釈は、クラウドソーシング、アノテーターの主観性、画像の品質の低さ、キーワード検索に基づく自動ラベル付けなどのさまざまな要因によるノイズに悩まされているため、重要な課題でした。このようなノイズの多い注釈は妨げになります。ディープネットワークの記憶能力によるFERのパフォーマンス。学習の初期段階では、ディープネットワークはクリーンなデータに適合します。その後、最終的には、記憶能力のためにノイズの多いラベルに過剰適合し始め、FERのパフォーマンスが制限されます。このレポートは、Affective Behavior Analysis in-the-wild（ABAW）2021コンペティションの表現認識トラックへの提出に使用されたコンセンサスコラボレーティブトレーニング（CCT）フレームワークを示しています。 CCTは、ノイズ分布について何も仮定せずに、監視損失と整合性損失の凸結合を使用して3つのネットワークを共同でトレーニングします。動的遷移メカニズムを使用して、早期学習における監視の喪失から、後の段階でのネットワーク間の予測のコンセンサスのための一貫性の喪失に移行します。共同トレーニングにより全体的なエラーが減少し、一貫性が失われるため、ノイズの多いサンプルへの過剰適合が防止されます。モデルのパフォーマンスは、カテゴリ式分類のための挑戦的なAff-Wild2データセットで検証されます。私たちのコードはhttps://github.com/1980x/ABAW2021DMACSで公開されています。

Facial expression recognition (FER) in the wild is crucial for building reliable human-computer interactive systems. However, annotations of large scale datasets in FER has been a key challenge as these datasets suffer from noise due to various factors like crowd sourcing, subjectivity of annotators, poor quality of images, automatic labelling based on key word search etc. Such noisy annotations impede the performance of FER due to the memorization ability of deep networks. During early learning stage, deep networks fit on clean data. Then, eventually, they start overfitting on noisy labels due to their memorization ability, which limits FER performance. This report presents Consensual Collaborative Training (CCT) framework used in our submission to expression recognition track of the Affective Behaviour Analysis in-the-wild (ABAW) 2021 competition. CCT co-trains three networks jointly using a convex combination of supervision loss and consistency loss, without making any assumption about the noise distribution. A dynamic transition mechanism is used to move from supervision loss in early learning to consistency loss for consensus of predictions among networks in the later stage. Co-training reduces overall error, and consistency loss prevents overfitting to noisy samples. The performance of the model is validated on challenging Aff-Wild2 dataset for categorical expression classification. Our code is made publicly available at https://github.com/1980x/ABAW2021DMACS.

updated: Sat Jul 24 2021 05:28:32 GMT+0000 (UTC)

published: Thu Jul 08 2021 04:28:21 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト