Zero-shot Text-driven Physically Interpretable Face Editing

Yapeng Meng; Songru Yang; Xu Hu; Rui Zhao; Lincheng Li; Zhenwei Shi; Zhengxia Zou

ゼロショットテキスト駆動の物理的に解釈可能な顔編集

この論文では、任意のテキストプロンプトに基づいて顔を編集するための、新規で物理的に解釈可能な方法を提案します。 GAN の潜在空間を操作する以前の GAN 反転ベースの顔編集方法や、逆拡散プロセスとして画像操作をモデル化する拡散ベースの方法とは異なり、私たちは顔編集プロセスを顔画像にベクトルフローフィールドを課すものとみなします。各画像ピクセルの空間座標と色のオフセット。上記で提案したパラダイムの下では、ベクトル流れ場を 2 つの方法で表現します。1) ラスター化されたテンソルを使用して流れベクトルを明示的に表現します。2) を活用して、流れベクトルを連続的で滑らかな解像度に依存しないニューラルフィールドとして暗黙的にパラメータ化します。暗黙的ニューラル表現の最近の進歩。フローベクトルは、編集済み画像とテキストプロンプト間の相関を最大化することにより、事前トレーニング済みの Contrastive Language-Image Pretraining~(CLIP) モデルのガイダンスの下で繰り返し最適化されます。また、高速であらゆるテキストプロンプト入力に適応できる、学習ベースのワンショット顔編集フレームワークも提案します。私たちの方法は、リアルタイムのビデオ顔編集にも柔軟に拡張できます。最先端のテキスト駆動の顔編集方法と比較して、私たちの方法は、高い同一性の一貫性と画質を備えた、物理的に解釈可能な顔編集結果を生成できます。私たちのコードは公開されます。

This paper proposes a novel and physically interpretable method for face editing based on arbitrary text prompts. Different from previous GAN-inversion-based face editing methods that manipulate the latent space of GANs, or diffusion-based methods that model image manipulation as a reverse diffusion process, we regard the face editing process as imposing vector flow fields on face images, representing the offset of spatial coordinates and color for each image pixel. Under the above-proposed paradigm, we represent the vector flow field in two ways: 1) explicitly represent the flow vectors with rasterized tensors, and 2) implicitly parameterize the flow vectors as continuous, smooth, and resolution-agnostic neural fields, by leveraging the recent advances of implicit neural representations. The flow vectors are iteratively optimized under the guidance of the pre-trained Contrastive Language-Image Pretraining~(CLIP) model by maximizing the correlation between the edited image and the text prompt. We also propose a learning-based one-shot face editing framework, which is fast and adaptable to any text prompt input. Our method can also be flexibly extended to real-time video face editing. Compared with state-of-the-art text-driven face editing methods, our method can generate physically interpretable face editing results with high identity consistency and image quality. Our code will be made publicly available.

updated: Fri Aug 11 2023 07:20:24 GMT+0000 (UTC)

published: Fri Aug 11 2023 07:20:24 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト