Weakly Supervised Regional and Temporal Learning for Facial Action Unit Recognition

Jingwei Yan; Jingjing Wang; Qiang Li; Chunmao Wang; Shiliang Pu

顔のアクションユニット認識のための弱く監視された地域的および時間的学習

自動顔アクションユニット（AU）の認識は、手動による注釈が不足しているため、困難な作業です。この問題を軽減するために、多数のラベルなしデータを活用するさまざまな弱く監視された方法を活用するために多大な努力が払われてきました。ただし、地域や関係の特性など、AUのいくつかの固有のプロパティに関する多くの側面は、以前の作業では十分に調査されていません。これに動機付けられて、AUプロパティを考慮に入れ、ラベルなしデータを介して自己監視方式で制限付きアノテーションとモデルパフォーマンスの間のギャップを埋めるための2つの補助AU関連タスクを提案します。具体的には、AU関係の埋め込みを使用して地域の特徴の識別を強化するために、ランダムにトリミングされたAUパッチを復元するRoI修復のタスクを設計します。一方、顔の筋肉の動的な変化を活用し、モーション情報をグローバルな特徴表現にエンコードするために、単一の画像ベースのオプティカルフロー推定タスクが提案されています。これらの2つの自己監視補助タスクに基づいて、AUのローカル機能、相互関係、およびモーションキューがバックボーンネットワークでより適切にキャプチャされます。さらに、半教師あり学習を組み込むことにより、AU認識のための弱教師あり地域および時間学習（WSRTL）という名前のエンドツーエンドのトレーニング可能なフレームワークを提案します。 BP4DとDISFAに関する広範な実験は、私たちの方法の優位性を実証し、新しい最先端のパフォーマンスが達成されています。

Automatic facial action unit (AU) recognition is a challenging task due to the scarcity of manual annotations. To alleviate this problem, a large amount of efforts has been dedicated to exploiting various weakly supervised methods which leverage numerous unlabeled data. However, many aspects with regard to some unique properties of AUs, such as the regional and relational characteristics, are not sufficiently explored in previous works. Motivated by this, we take the AU properties into consideration and propose two auxiliary AU related tasks to bridge the gap between limited annotations and the model performance in a self-supervised manner via the unlabeled data. Specifically, to enhance the discrimination of regional features with AU relation embedding, we design a task of RoI inpainting to recover the randomly cropped AU patches. Meanwhile, a single image based optical flow estimation task is proposed to leverage the dynamic change of facial muscles and encode the motion information into the global feature representation. Based on these two self-supervised auxiliary tasks, local features, mutual relation and motion cues of AUs are better captured in the backbone network. Furthermore, by incorporating semi-supervised learning, we propose an end-to-end trainable framework named weakly supervised regional and temporal learning (WSRTL) for AU recognition. Extensive experiments on BP4D and DISFA demonstrate the superiority of our method and new state-of-the-art performances are achieved.

updated: Fri Apr 01 2022 12:02:01 GMT+0000 (UTC)

published: Fri Apr 01 2022 12:02:01 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト