Face Behavior a la carte: Expressions, Affect and Action Units in a Single Network

Dimitrios Kollias; Viktoriia Sharmanska; Stefanos Zafeiriou

アラカルトの顔の行動：単一のネットワークにおける表現、感情、行動の単位

自動の顔の行動分析には、コンピュータビジョン、生理学、心理学の共通部分における研究の長い歴史があります。ただし、大規模なデータセットのコレクションとディープニューラルネットワークなどの強力な機械学習手法のおかげで、顔の自動動作分析が成功し始めたのはごく最近のことです。その象徴的なタスクの3つは、基本的な表現の自動認識（例：幸せ、悲しい、驚き）、継続的な感情の推定（例：価と覚醒）、および顔のアクションユニットの検出（例：上眉毛/内眉毛、鼻のしわの活性化）。これまでのところ、これらのタスクは主に独立して研究されており、タスクのデータセットを収集しています。私たちは、FaceBehaviorNetと呼ばれる単一のマルチタスク、マルチドメイン、マルチラベルネットワークで共同で学習したすべての顔の行動タスクの最初で最大の研究を紹介します。このために、コミュニティで公開されているすべてのデータセット（約500万枚の画像）を使用して、実際の顔の行動タスクを研究します。すべてのタスクのエンドツーエンドネットワークを共同でトレーニングすると、シングルタスクネットワークのそれぞれをトレーニングするよりも一貫して優れたパフォーマンスが得られることを示しています。さらに、トレーニング中のタスクを結合するための2つの簡単な戦略、共注釈と分布マッチングを提案し、このアプローチの利点を示します。最後に、FaceBehaviorNetが顔の動作のすべての側面をカプセル化する機能を学習し、ゼロショットと少数ショットの学習設定でトレーニングされたタスク以外のタスク（複合感情認識）を正常に実行できることを示します。

Automatic facial behavior analysis has a long history of studies in the intersection of computer vision, physiology and psychology. However it is only recently, with the collection of large-scale datasets and powerful machine learning methods such as deep neural networks, that automatic facial behavior analysis started to thrive. Three of its iconic tasks are automatic recognition of basic expressions (e.g. happy, sad, surprised), estimation of continuous emotions (e.g., valence and arousal), and detection of facial action units (activations of e.g. upper/inner eyebrows, nose wrinkles). Up until now these tasks have been mostly studied independently collecting a dataset for the task. We present the first and the largest study of all facial behaviour tasks learned jointly in a single multi-task, multi-domain and multi-label network, which we call FaceBehaviorNet. For this we utilize all publicly available datasets in the community (around 5M images) that study facial behaviour tasks in-the-wild. We demonstrate that training jointly an end-to-end network for all tasks has consistently better performance than training each of the single-task networks. Furthermore, we propose two simple strategies for coupling the tasks during training, co-annotation and distribution matching, and show the advantages of this approach. Finally we show that FaceBehaviorNet has learned features that encapsulate all aspects of facial behaviour, and can be successfully applied to perform tasks (compound emotion recognition) beyond the ones that it has been trained in a zero- and few-shot learning setting.

updated: Fri May 29 2020 02:35:49 GMT+0000 (UTC)

published: Tue Oct 15 2019 15:45:41 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト