MViT: Mask Vision Transformer for Facial Expression Recognition in the wild

Hanting Li; Mingzhe Sui; Feng Zhao; Zhengjun Zha; Feng Wu

MViT: 野生での表情認識のためのマスクビジョントランスフォーマー

野生の顔の表情認識 (FER) は、さまざまな背景、低品質の顔画像、およびアノテーターの主観性のため、コンピュータービジョンでは非常に困難なタスクです。これらの不確実性により、ニューラルネットワークが限られた規模のデータセットで堅牢な特徴を学習することが難しくなります。さらに、ネットワークは上記の要因によって容易に分散され、誤った決定を行う可能性があります。最近、ビジョントランスフォーマー (ViT) とデータ効率の高い画像トランスフォーマー (DeiT) は、従来の分類タスクで大きなパフォーマンスを発揮します。自己注意メカニズムにより、トランスフォーマーは第 1 層にグローバルな受容野を獲得し、特徴抽出能力を劇的に向上させます。この作業では、最初に、野生の FER 用の新しい純粋なトランスフォーマーベースのマスクビジョントランスフォーマー (MViT) を提案します。これは、2 つのモジュールで構成されます。トランスフォーマーベースのマスク生成ネットワーク (MGN) は、複雑なフィルターを除去できるマスクを生成します。顔画像の背景とオクルージョン、および野生の FER データセットの誤ったラベルを修正する動的再ラベル付けモジュール。広範な実験結果は、当社の MViT が、RAF-DB での最先端の方法をそれぞれ 88.62%、FERPlus で 89.22%、AffectNet-7 で 64.57% 上回っており、AffectNet-8 では 61.40 で同等の結果を達成していることを示しています。 %。

Facial Expression Recognition (FER) in the wild is an extremely challenging task in computer vision due to variant backgrounds, low-quality facial images, and the subjectiveness of annotators. These uncertainties make it difficult for neural networks to learn robust features on limited-scale datasets. Moreover, the networks can be easily distributed by the above factors and perform incorrect decisions. Recently, vision transformer (ViT) and data-efficient image transformers (DeiT) present their significant performance in traditional classification tasks. The self-attention mechanism makes transformers obtain a global receptive field in the first layer which dramatically enhances the feature extraction capability. In this work, we first propose a novel pure transformer-based mask vision transformer (MViT) for FER in the wild, which consists of two modules: a transformer-based mask generation network (MGN) to generate a mask that can filter out complex backgrounds and occlusion of face images, and a dynamic relabeling module to rectify incorrect labels in FER datasets in the wild. Extensive experimental results demonstrate that our MViT outperforms state-of-the-art methods on RAF-DB with 88.62%, FERPlus with 89.22%, and AffectNet-7 with 64.57%, respectively, and achieves a comparable result on AffectNet-8 with 61.40%.

updated: Tue Jun 08 2021 16:58:10 GMT+0000 (UTC)

published: Tue Jun 08 2021 16:58:10 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト