Multi-Modal Face Stylization with a Generative Prior

Mengtian Li; Yi Dong; Minxuan Lin; Haibin Huang; Pengfei Wan; Chongyang Ma

生成事前を使用したマルチモーダルな顔の様式化

この作品では、芸術的な顔の様式化に対する新しいアプローチを導入します。既存の方法がこのタスクで目覚ましい結果を達成しているにもかかわらず、多様なスタイルと正確な顔の再構成を備えた高品質の定型化された顔を生成するには、まだ改善の余地があります。私たちが提案するフレームワーク MMFS は、StyleGAN の強みを活用することでマルチモーダルな顔の様式化をサポートし、それをエンコーダ/デコーダアーキテクチャに統合します。具体的には、StyleGAN の中解像度レイヤーと高解像度レイヤーをデコーダーとして使用して高品質の顔を生成し、低解像度レイヤーをエンコーダーと調整して入力された顔の詳細を抽出して保存します。また、2 段階のトレーニング戦略も導入しています。この戦略では、最初の段階でエンコーダーをトレーニングして、特徴マップを StyleGAN に合わせて入力顔の忠実な再構築を可能にします。第 2 段階では、様式化された顔を生成するために、芸術的なデータを使用してネットワーク全体が微調整されます。微調整されたモデルをゼロショットおよびワンショットのスタイル化タスクに適用できるようにするために、大規模な対照言語画像事前トレーニング (CLIP) 空間から潜在 w+ 空間への追加のマッピングネットワークをトレーニングします。微調整された StyleGAN。定性的および定量的実験では、私たちのフレームワークがワンショットとゼロショットの両方のスタイル化タスクで優れた顔のスタイル化パフォーマンスを達成し、最先端の方法を大幅に上回っていることが示されています。

In this work, we introduce a new approach for artistic face stylization. Despite existing methods achieving impressive results in this task, there is still room for improvement in generating high-quality stylized faces with diverse styles and accurate facial reconstruction. Our proposed framework, MMFS, supports multi-modal face stylization by leveraging the strengths of StyleGAN and integrates it into an encoder-decoder architecture. Specifically, we use the mid-resolution and high-resolution layers of StyleGAN as the decoder to generate high-quality faces, while aligning its low-resolution layer with the encoder to extract and preserve input facial details. We also introduce a two-stage training strategy, where we train the encoder in the first stage to align the feature maps with StyleGAN and enable a faithful reconstruction of input faces. In the second stage, the entire network is fine-tuned with artistic data for stylized face generation. To enable the fine-tuned model to be applied in zero-shot and one-shot stylization tasks, we train an additional mapping network from the large-scale Contrastive-Language-Image-Pre-training (CLIP) space to a latent w+ space of fine-tuned StyleGAN. Qualitative and quantitative experiments show that our framework achieves superior face stylization performance in both one-shot and zero-shot stylization tasks, outperforming state-of-the-art methods by a large margin.

updated: Mon May 29 2023 11:01:31 GMT+0000 (UTC)

published: Mon May 29 2023 11:01:31 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト