RIGID: Recurrent GAN Inversion and Editing of Real Face Videos

Yangyang Xu; Shengfeng He; Kwan-Yee K. Wong; Ping Luo

RIGID: 反復的な GAN 反転とリアルフェイスビデオの編集

GAN の強力な編集機能を実際の画像に適用するには、GAN 反転が不可欠です。ただし、既存の方法ではビデオフレームを個別に反転するため、時間の経過とともに望ましくない一貫性のない結果が生じることがよくあります。この論文では、実際のビデオの時間的にコヒーレントな GAN 反転と顔編集を明示的かつ同時に強制する、Recurrent vIdeo GAN Inversion and eDiting (RIGID) という名前の統合リカレントフレームワークを提案します。私たちのアプローチは、現在のフレームと前のフレームの間の時間的関係を 3 つの側面からモデル化します。忠実な実際のビデオの再構成を可能にするために、まず時間補償された潜在コードを学習することによって反転の忠実性と一貫性を最大化します。次に、インコヒーレントなノイズが高周波領域に存在し、潜在空間から解きほぐされることが観察されます。第三に、属性操作後の不一致を除去するために、任意のフレームが隣接するフレームの直接合成でなければならないという中間フレーム合成制約を提案します。当社の統合フレームワークは、入力フレーム間の固有の一貫性をエンドツーエンドで学習するため、特定の属性に依存せず、再トレーニングすることなく同じビデオの任意の編集に適用できます。広範な実験により、RIGID が反転タスクと編集タスクの両方において定性的および定量的に最先端の方法よりも優れていることが実証されました。成果物は https://cnnlstm.github.io/RIGID にあります。

GAN inversion is indispensable for applying the powerful editability of GAN to real images. However, existing methods invert video frames individually often leading to undesired inconsistent results over time. In this paper, we propose a unified recurrent framework, named Recurrent vIdeo GAN Inversion and eDiting (RIGID), to explicitly and simultaneously enforce temporally coherent GAN inversion and facial editing of real videos. Our approach models the temporal relations between current and previous frames from three aspects. To enable a faithful real video reconstruction, we first maximize the inversion fidelity and consistency by learning a temporal compensated latent code. Second, we observe incoherent noises lie in the high-frequency domain that can be disentangled from the latent space. Third, to remove the inconsistency after attribute manipulation, we propose an in-between frame composition constraint such that the arbitrary frame must be a direct composite of its neighboring frames. Our unified framework learns the inherent coherence between input frames in an end-to-end manner, and therefore it is agnostic to a specific attribute and can be applied to arbitrary editing of the same video without re-training. Extensive experiments demonstrate that RIGID outperforms state-of-the-art methods qualitatively and quantitatively in both inversion and editing tasks. The deliverables can be found in https://cnnlstm.github.io/RIGID

updated: Fri Aug 11 2023 12:17:24 GMT+0000 (UTC)

published: Fri Aug 11 2023 12:17:24 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト