A Multi-modal and Multi-task Learning Method for Action Unit and Expression Recognition

Yue Jin; Tianqing Zheng; Chao Gao; Guoqiang Xu

アクションユニットと表現認識のためのマルチモーダルおよびマルチタスク学習方法

人間の影響を分析することは、人間とコンピューターの相互作用システムにとって不可欠です。ほとんどの方法は、実際の設定では実用的ではない制限されたシナリオで開発されています。 Affective Behavior Analysis in-the-wild（ABAW）2021コンテストは、このin-the-wild問題のベンチマークを提供します。本論文では、視覚情報と音声情報の両方を使用することにより、マルチモーダルおよびマルチタスク学習方法を紹介します。 AUアノテーションと式アノテーションの両方を使用してモデルをトレーニングし、シーケンスモデルを適用して、ビデオフレーム間の関連付けをさらに抽出します。検証セットで、AUスコア0.712と式スコア0.477を達成します。これらの結果は、モデルのパフォーマンスを改善する上でのアプローチの有効性を示しています。

Analyzing human affect is vital for human-computer interaction systems. Most methods are developed in restricted scenarios which are not practical for in-the-wild settings. The Affective Behavior Analysis in-the-wild (ABAW) 2021 Contest provides a benchmark for this in-the-wild problem. In this paper, we introduce a multi-modal and multi-task learning method by using both visual and audio information. We use both AU and expression annotations to train the model and apply a sequence model to further extract associations between video frames. We achieve an AU score of 0.712 and an expression score of 0.477 on the validation set. These results demonstrate the effectiveness of our approach in improving model performance.

updated: Fri Jul 09 2021 03:28:17 GMT+0000 (UTC)

published: Fri Jul 09 2021 03:28:17 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト