Facial Expression Classification using Fusion of Deep Neural Network in Video for the 3rd ABAW3 Competition

Kim Ngan Phan; Hong-Hai Nguyen; Van-Thong Huynh; Soo-Hyung Kim

第3回ABAW3コンペティションのビデオにおけるディープニューラルネットワークの融合を使用した顔の表情の分類

コンピュータが人間の感情を認識するためには、表現の分類は人間とコンピュータの相互作用の領域でも同様に重要な問題です。第3回AffectiveBehaviorAnalysis In-The-Wildコンテストでは、表現分類のタスクには、ビデオからの人間の顔の6つの基本的な表現を含む8つのクラスが含まれます。この論文では、トランスフォーマーメカニズムを使用して、バックボーンからの堅牢な表現をエンコードします。堅牢な表現の融合は、式の分類タスクで重要な役割を果たします。私たちのアプローチは、検証セットとテストセットのF_1スコアでそれぞれ30.35％と28.60％を達成します。この結果は、Aff-Wild2データセットに基づいて提案されたアーキテクチャの有効性を示しています。

For computers to recognize human emotions, expression classification is an equally important problem in the human-computer interaction area. In the 3rd Affective Behavior Analysis In-The-Wild competition, the task of expression classification includes eight classes with six basic expressions of human faces from videos. In this paper, we employ a transformer mechanism to encode the robust representation from the backbone. Fusion of the robust representations plays an important role in the expression classification task. Our approach achieves 30.35% and 28.60% for the F_1 score on the validation set and the test set, respectively. This result shows the effectiveness of the proposed architecture based on the Aff-Wild2 dataset.

updated: Fri Apr 08 2022 05:20:40 GMT+0000 (UTC)

published: Thu Mar 24 2022 07:36:21 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト