HyperStyle: StyleGAN Inversion with HyperNetworks for Real Image Editing

Yuval Alaluf; Omer Tov; Ron Mokady; Rinon Gal; Amit H. Bermano

HyperStyle：実像編集のためのHyperNetworksによるStyleGAN反転

StyleGANの潜在空間への実像の反転はよく研究された問題です。それにもかかわらず、再構成と編集可能性の間の固有のトレードオフのために、既存のアプローチを現実世界のシナリオに適用することは未解決の課題のままです。実際の画像を正確に表すことができる潜在空間領域は、通常、セマンティック制御の低下に悩まされます。最近の研究では、ジェネレータを微調整して、潜在空間の正常に動作し、編集可能な領域にターゲット画像を追加することで、このトレードオフを軽減することを提案しています。有望ではありますが、この微調整スキームは、新しい画像ごとに長いトレーニングフェーズを必要とするため、一般的な使用には実用的ではありません。この作業では、このアプローチをエンコーダベースの反転の領域に導入します。潜在空間の編集可能な領域で特定の画像を忠実に表現するためにStyleGANの重みを変調することを学習するハイパーネットワークであるHyperStyleを提案します。ナイーブな変調アプローチでは、30億を超えるパラメーターを使用してハイパーネットワークをトレーニングする必要があります。注意深いネットワーク設計により、これを既存のエンコーダーと一致するように削減します。 HyperStyleは、エンコーダーのほぼリアルタイムの推論機能を備えた最適化手法に匹敵する再構成を生成します。最後に、トレーニング中には見られなかったドメイン外の画像の編集など、反転タスク以外のいくつかのアプリケーションでのHyperStyleの有効性を示します。

The inversion of real images into StyleGAN's latent space is a well-studied problem. Nevertheless, applying existing approaches to real-world scenarios remains an open challenge, due to an inherent trade-off between reconstruction and editability: latent space regions which can accurately represent real images typically suffer from degraded semantic control. Recent work proposes to mitigate this trade-off by fine-tuning the generator to add the target image to well-behaved, editable regions of the latent space. While promising, this fine-tuning scheme is impractical for prevalent use as it requires a lengthy training phase for each new image. In this work, we introduce this approach into the realm of encoder-based inversion. We propose HyperStyle, a hypernetwork that learns to modulate StyleGAN's weights to faithfully express a given image in editable regions of the latent space. A naive modulation approach would require training a hypernetwork with over three billion parameters. Through careful network design, we reduce this to be in line with existing encoders. HyperStyle yields reconstructions comparable to those of optimization techniques with the near real-time inference capabilities of encoders. Lastly, we demonstrate HyperStyle's effectiveness on several applications beyond the inversion task, including the editing of out-of-domain images which were never seen during training.

updated: Tue Nov 30 2021 18:56:30 GMT+0000 (UTC)

published: Tue Nov 30 2021 18:56:30 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト