Self-Supervised 3D Face Reconstruction via Conditional Estimation

Yandong Wen; Weiyang Liu; Bhiksha Raj; Rita Singh

条件付き推定による自己教師あり3D顔再構成

ビデオからの自己教師ありトレーニングによって2Dシングルビュー画像から3D顔パラメータを学習するための条件付き推定（CEST）フレームワークを提示します。 CESTは、合成による分析プロセスに基づいており、3D顔パラメータ（形状、反射率、視点、照明）が顔画像から推定され、再結合されて2D顔画像が再構築されます。ラベルに明示的にアクセスせずに意味的に意味のある3D顔パラメータを学習するために、CESTは、統計的依存性を考慮に入れて、さまざまな3D顔パラメータの推定を結合します。具体的には、3D顔パラメータの推定は、特定の画像だけでなく、すでに導出されている顔パラメータにも基づいています。さらに、ビデオフレーム間の反射率の対称性と一貫性を採用して、顔のパラメータの解きほぐしを改善します。反射率の対称性と一貫性を組み込むための新しい戦略とともに、CESTは実際のビデオクリップで効率的にトレーニングできます。定性的実験と定量的実験の両方で、CESTの有効性が実証されています。

We present a conditional estimation (CEST) framework to learn 3D facial parameters from 2D single-view images by self-supervised training from videos. CEST is based on the process of analysis by synthesis, where the 3D facial parameters (shape, reflectance, viewpoint, and illumination) are estimated from the face image, and then recombined to reconstruct the 2D face image. In order to learn semantically meaningful 3D facial parameters without explicit access to their labels, CEST couples the estimation of different 3D facial parameters by taking their statistical dependency into account. Specifically, the estimation of any 3D facial parameter is not only conditioned on the given image, but also on the facial parameters that have already been derived. Moreover, the reflectance symmetry and consistency among the video frames are adopted to improve the disentanglement of facial parameters. Together with a novel strategy for incorporating the reflectance symmetry and consistency, CEST can be efficiently trained with in-the-wild video clips. Both qualitative and quantitative experiments demonstrate the effectiveness of CEST.

updated: Sun Oct 10 2021 14:02:19 GMT+0000 (UTC)

published: Sun Oct 10 2021 14:02:19 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト