Spatial and Temporal Networks for Facial Expression Recognition in the Wild Videos

Shuyi Mao; Xinqi Fan; Xiaojiang Peng

野生のビデオにおける顔の表情認識のための空間的および時間的ネットワーク

この論文では、2021年のAffective Behavior Analysis in-the-wild（ABAW）コンペティションの7つの基本的な表現分類トラックに対して提案された方法論について説明します。このタスクでは、顔の表情認識（FER）メソッドは、多様な背景から正しい表現カテゴリを分類することを目的としています。、しかしいくつかの課題があります。まず、モデルを実際のシナリオに適応させるために、事前にトレーニングされた大規模な顔認識データからの知識を使用します。次に、畳み込みニューラルネットワーク（CNN）、CNNリカレントニューラルネットワーク（CNN-RNN）、およびCNN-Transformer（CNN-Transformer）を使用して、空間情報と時間情報の両方を組み込むアンサンブルモデルを提案します。私たちのアンサンブルモデルは、検証セットでF1を0.4133、精度を0.6216、最終メトリックを0.4821として達成しました。

The paper describes our proposed methodology for the seven basic expression classification track of Affective Behavior Analysis in-the-wild (ABAW) Competition 2021. In this task, facial expression recognition (FER) methods aim to classify the correct expression category from a diverse background, but there are several challenges. First, to adapt the model to in-the-wild scenarios, we use the knowledge from pre-trained large-scale face recognition data. Second, we propose an ensemble model with a convolution neural network (CNN), a CNN-recurrent neural network (CNN-RNN), and a CNN-Transformer (CNN-Transformer), to incorporate both spatial and temporal information. Our ensemble model achieved F1 as 0.4133, accuracy as 0.6216 and final metric as 0.4821 on the validation set.

updated: Mon Jul 12 2021 01:41:23 GMT+0000 (UTC)

published: Mon Jul 12 2021 01:41:23 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト