Towards an objective characterization of an individual's facial movements using Self-Supervised Person-Specific-Models

Yanis Tazi; Michael Berger; Winrich A. Freiwald

Self-Supervised Person-Specific-Models を使用した個人の顔の動きの客観的な特徴付けに向けて

顔の動きは個人によって大きく異なるため、他の顔の特徴、特に顔のアイデンティティから顔の動きを解きほぐすことは依然として困難な作業です。この論文では、個人特有の顔の動きを特徴づけることを目指しています。他の顔の特徴とは独立して顔の動きを学習するための新しいトレーニングアプローチを提示し、各個人に個別に焦点を当てます。自己教師あり個人固有モデル (PSM) を提案します。このモデルでは、個人ごとに 1 つのモデルが、ラベルのない顔のビデオから、個人のアイデンティティやその他の構造的な顔の特徴とは無関係に、顔の動きの埋め込みを抽出することを学習できます。これらのモデルは、エンコーダー/デコーダーのようなアーキテクチャを使用してトレーニングされます。個人間でトレーニングされ、顔の動きの一般的なパターンを特徴付ける一般モデル (GM) によって特徴付けられない細かい動きを発見する意味のある顔の埋め込みを PSM が学習するという定量的および定性的な証拠を提供します。このアプローチは、新しい個人にとって簡単に拡張可能で一般化できるという定量的および定性的な証拠を提示します。人について学んだ顔の動きの知識は、新しい人に迅速かつ効果的に転送できます。最後に、カリキュラムの時間学習を使用して、ビデオフレーム間の時間的連続性を活用する新しい PSM を提案します。コード、分析の詳細、事前トレーニング済みのすべてのモデルは、Github および補足資料で入手できます。

Disentangling facial movements from other facial characteristics, particularly from facial identity, remains a challenging task, as facial movements display great variation between individuals. In this paper, we aim to characterize individual-specific facial movements. We present a novel training approach to learn facial movements independently of other facial characteristics, focusing on each individual separately. We propose self-supervised Person-Specific Models (PSMs), in which one model per individual can learn to extract an embedding of the facial movements independently of the person's identity and other structural facial characteristics from unlabeled facial video. These models are trained using encoder-decoder-like architectures. We provide quantitative and qualitative evidence that a PSM learns a meaningful facial embedding that discovers fine-grained movements otherwise not characterized by a General Model (GM), which is trained across individuals and characterizes general patterns of facial movements. We present quantitative and qualitative evidence that this approach is easily scalable and generalizable for new individuals: facial movements knowledge learned on a person can quickly and effectively be transferred to a new person. Lastly, we propose a novel PSM using curriculum temporal learning to leverage the temporal contiguity between video frames. Our code, analysis details, and all pretrained models are available in Github and Supplementary Materials.

updated: Tue Nov 15 2022 16:30:24 GMT+0000 (UTC)

published: Tue Nov 15 2022 16:30:24 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト