LiP-Flow: Learning Inference-time Priors for Codec Avatars via Normalizing Flows in Latent Space

Emre Aksan; Shugao Ma; Akin Caliskan; Stanislav Pidhorskyi; Alexander Richard; Shih-En Wei; Jason Saragih; Otmar Hilliges

LiP-Flow：潜在空間のフローの正規化によるコーデックアバターの推論時間事前確率の学習

カメラのドームでキャプチャされたマルチビューデータからトレーニングされたニューラルフェイスアバターは、写実的な3D再構成を生成できます。ただし、推論時には、ヘッドセットに取り付けられたカメラまたは前面カメラによって記録された部分的なビューや、まばらな顔のランドマークなど、限られた入力によって駆動する必要があります。この非対称性を軽減するために、ランタイム入力を条件とする以前のモデルを導入し、潜在空間の正規化フローを介してこの以前の空間を3D顔モデルに結び付けます。提案されたモデルであるLiP-Flowは、豊富なトレーニング時間と貧弱な推論時間の観測から表現を学習する2つのエンコーダーで構成されています。正規化フローは、2つの表現空間を橋渡しし、潜在サンプルを1つのドメインから別のドメインに変換して、潜在尤度目的を定義できるようにします。モデルをエンドツーエンドでトレーニングして、表現空間と再構成品質の両方の類似性を最大化し、3D顔モデルが限られた運転信号を認識できるようにしました。部分的またはまばらな観測から3Dアバターを再構築するために、潜在コードが最適化される広範な評価を実施します。私たちのアプローチは、表現力豊かで効果的な事前情報につながり、顔のダイナミクスと微妙な表情をよりよく捉えることを示しています。

Neural face avatars that are trained from multi-view data captured in camera domes can produce photo-realistic 3D reconstructions. However, at inference time, they must be driven by limited inputs such as partial views recorded by headset-mounted cameras or a front-facing camera, and sparse facial landmarks. To mitigate this asymmetry, we introduce a prior model that is conditioned on the runtime inputs and tie this prior space to the 3D face model via a normalizing flow in the latent space. Our proposed model, LiP-Flow, consists of two encoders that learn representations from the rich training-time and impoverished inference-time observations. A normalizing flow bridges the two representation spaces and transforms latent samples from one domain to another, allowing us to define a latent likelihood objective. We trained our model end-to-end to maximize the similarity of both representation spaces and the reconstruction quality, making the 3D face model aware of the limited driving signals. We conduct extensive evaluations where the latent codes are optimized to reconstruct 3D avatars from partial or sparse observations. We show that our approach leads to an expressive and effective prior, capturing facial dynamics and subtle expressions better.

updated: Tue Mar 15 2022 13:22:57 GMT+0000 (UTC)

published: Tue Mar 15 2022 13:22:57 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト