StyleMask: Disentangling the Style Space of StyleGAN2 for Neural Face Reenactment

Stella Bounareli; Christos Tzelepis; Vasileios Argyriou; Ioannis Patras; Georgios Tzimiropoulos

StyleMask: ニューラルフェイスの再現のための StyleGAN2 のスタイル空間のもつれを解く

この論文では、ニューラルフェイスの再現の問題に対処します。ここで、ソースとターゲットの顔画像のペアが与えられた場合、ターゲットのポーズ (頭のポーズとその表情として定義されます) をソース画像に転送する必要があります。ソースとターゲットの顔が異なるアイデンティティに属している困難なケースであっても、同時にソースのアイデンティティ特性 (顔の形、髪型など) を保持します。そうすることで、最先端の作品のいくつかの制限に対処します。つまり、a) ペアのトレーニングデータに依存すること (つまり、ソースとターゲットの顔が同じ ID を持っている)、b) 依存することです。推論中のラベル付けされたデータ、および c) 大きな頭部姿勢の変化で同一性を保持しないこと。より具体的には、ペアになっていないランダムに生成された顔画像を使用して、最近導入された StyleGAN2 のスタイル空間 S を組み込むことにより、顔のアイデンティティ特性をそのポーズから解くことを学習するフレームワークを提案します。これを利用することで、3D モデルからの監視を使用して、ソースとターゲットのスタイルコードのペアをうまくミックスする方法を学びます。結果として得られる潜在コードは、後で再現に使用され、ターゲットのみの顔のポーズに対応する潜在ユニットと、ソースのアイデンティティのみに対応するユニットで構成され、最近の状態と比較して再現パフォーマンスの顕著な改善につながります-最先端の方法。最先端技術と比較して、提案された方法が極端なポーズの変化に対してもより高い品質の結果を生成することを定量的および定性的に示します。最後に、最初に事前訓練されたジェネレーターの潜在空間にそれらを埋め込むことにより、実際の画像の結果を報告します。コードと事前トレーニング済みのモデルは、https://github.com/StelaBou/StyleMask で公開されています。

In this paper we address the problem of neural face reenactment, where, given a pair of a source and a target facial image, we need to transfer the target's pose (defined as the head pose and its facial expressions) to the source image, by preserving at the same time the source's identity characteristics (e.g., facial shape, hair style, etc), even in the challenging case where the source and the target faces belong to different identities. In doing so, we address some of the limitations of the state-of-the-art works, namely, a) that they depend on paired training data (i.e., source and target faces have the same identity), b) that they rely on labeled data during inference, and c) that they do not preserve identity in large head pose changes. More specifically, we propose a framework that, using unpaired randomly generated facial images, learns to disentangle the identity characteristics of the face from its pose by incorporating the recently introduced style space S of StyleGAN2, a latent representation space that exhibits remarkable disentanglement properties. By capitalizing on this, we learn to successfully mix a pair of source and target style codes using supervision from a 3D model. The resulting latent code, that is subsequently used for reenactment, consists of latent units corresponding to the facial pose of the target only and of units corresponding to the identity of the source only, leading to notable improvement in the reenactment performance compared to recent state-of-the-art methods. In comparison to state of the art, we quantitatively and qualitatively show that the proposed method produces higher quality results even on extreme pose variations. Finally, we report results on real images by first embedding them on the latent space of the pretrained generator. We make the code and pretrained models publicly available at: https://github.com/StelaBou/StyleMask

updated: Tue Sep 27 2022 13:22:35 GMT+0000 (UTC)

published: Tue Sep 27 2022 13:22:35 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト