Spatial Latent Representations in Generative Adversarial Networks for Image Generation

Maciej Sypetkowski

画像生成のための敵対的生成ネットワークにおける空間潜在表現

大部分の GAN アーキテクチャでは、潜在空間は特定の次元のベクトルのセットとして定義されます。このような表現は簡単に解釈できず、画像コンテンツの空間情報を直接キャプチャしません。この作業では、StyleGAN2 の空間潜在空間のファミリーを定義します。これにより、より多くの詳細をキャプチャし、複数の顔の画像や目が二つ以上ある顔。これらの空間で属性編集を実行できる属性モデルとともに、画像を空間にエンコードする方法を提案します。私たちの空間が画像操作に効果的であり、意味情報をうまくエンコードすることを示します。私たちのアプローチは、事前にトレーニングされたジェネレーターモデルで使用でき、事前に生成された方向ベクトルを使用して属性の編集を行うことができ、実験や使用への参入障壁が非常に低くなります。潜在表現を最適化するための正規化方法を提案します。これは、潜在空間の部分の分布を均等化し、表現を生成された表現にはるかに近づけます。これを使用して画像を空間空間にエンコードし、セマンティクスと編集目的で属性モデルを使用する機能を維持しながら、品質を大幅に向上させます。全体として、私たちの方法を使用すると、セマンティクスを維持しながら、標準の方法と比較して LPIPS スコアの点で 30% ものエンコード品質の向上が得られます。さらに、空間潜在空間での StyleGAN2 トレーニング手順を提案し、カスタムの空間潜在表現分布とともに、表現内の空間的に近い要素を遠い要素よりも相互に依存させるようにします。このようなアプローチにより、SpaceNet で FID スコアが 29% 向上し、衛星画像などの空間的に均一なデータセットで任意のサイズの一貫した画像を生成できます。

In the majority of GAN architectures, the latent space is defined as a set of vectors of given dimensionality. Such representations are not easily interpretable and do not capture spatial information of image content directly. In this work, we define a family of spatial latent spaces for StyleGAN2, capable of capturing more details and representing images that are out-of-sample in terms of the number and arrangement of object parts, such as an image of multiple faces or a face with more than two eyes. We propose a method for encoding images into our spaces, together with an attribute model capable of performing attribute editing in these spaces. We show that our spaces are effective for image manipulation and encode semantic information well. Our approach can be used on pre-trained generator models, and attribute edition can be done using pre-generated direction vectors making the barrier to entry for experimentation and use extremely low. We propose a regularization method for optimizing latent representations, which equalizes distributions of parts of latent spaces, making representations much closer to generated ones. We use it for encoding images into spatial spaces to obtain significant improvement in quality while keeping semantics and ability to use our attribute model for edition purposes. In total, using our methods gives encoding quality boost even as high as 30% in terms of LPIPS score comparing to standard methods, while keeping semantics. Additionally, we propose a StyleGAN2 training procedure on our spatial latent spaces, together with a custom spatial latent representation distribution to make spatially closer elements in the representation more dependent on each other than farther elements. Such approach improves the FID score by 29% on SpaceNet, and is able to generate consistent images of arbitrary sizes on spatially homogeneous datasets, like satellite imagery.

updated: Sat Mar 25 2023 20:01:11 GMT+0000 (UTC)

published: Sat Mar 25 2023 20:01:11 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト