IntereStyle: Encoding an Interest Region for Robust StyleGAN Inversion

Seungjun Moon; Gyeong-Moon Park

IntereStyle: 堅牢な StyleGAN 反転のための関心領域のエンコード

最近、現実世界の画像の操作は、Generative Adversarial Networks (GAN) と、現実世界の画像を潜在空間に埋め込む対応するエンコーダーの開発とともに非常に精巧になっています。ただし、歪みと認識の間のトレードオフにより、GAN のエンコーダーの設計は依然として困難な作業のままです。この論文では、既存のエンコーダが、人間の顔領域などの関心領域だけでなく、背景パターンや障害物などの非関心領域でも歪みを低減しようとしていることを指摘します。ただし、現実世界の画像のほとんどの関心のない領域は分布外 (OOD) にあり、生成モデルによって理想的に再構築することは不可能です。さらに、関心領域と重なった非関心領域は、関心領域の元の特徴を壊す可能性があることを経験的に発見しました。たとえば、顔領域と重なったマイクは白ひげに反転します。その結果、知覚品質を維持しながら画像全体の歪みを下げることは非常に困難です。このトレードオフを克服するために、関心領域に焦点を当てることでエンコードを容易にする、IntereStyle という造語である、シンプルでありながら効果的なエンコーダートレーニングスキームを提案します。 IntereStyle は、エンコーダーを操作して、対象領域と非対象領域のエンコーディングを解きほぐします。この目的のために、無関心領域の情報を繰り返しフィルタリングして、無関心領域の悪影響を調整します。 IntereStyle が、既存の最先端のエンコーダーと比較して、より低い歪みとより高い知覚品質の両方を実現することを実証します。特に、モデルは元の画像の特徴をロバストに保存しており、ロバストな画像編集とスタイルの混合結果を示しています。レビュー後、事前トレーニング済みのモデルでコードをリリースします。

Recently, manipulation of real-world images has been highly elaborated along with the development of Generative Adversarial Networks (GANs) and corresponding encoders, which embed real-world images into the latent space. However, designing encoders of GAN still remains a challenging task due to the trade-off between distortion and perception. In this paper, we point out that the existing encoders try to lower the distortion not only on the interest region, e.g., human facial region but also on the uninterest region, e.g., background patterns and obstacles. However, most uninterest regions in real-world images are located at out-of-distribution (OOD), which are infeasible to be ideally reconstructed by generative models. Moreover, we empirically find that the uninterest region overlapped with the interest region can mangle the original feature of the interest region, e.g., a microphone overlapped with a facial region is inverted into the white beard. As a result, lowering the distortion of the whole image while maintaining the perceptual quality is very challenging. To overcome this trade-off, we propose a simple yet effective encoder training scheme, coined IntereStyle, which facilitates encoding by focusing on the interest region. IntereStyle steers the encoder to disentangle the encodings of the interest and uninterest regions. To this end, we filter the information of the uninterest region iteratively to regulate the negative impact of the uninterest region. We demonstrate that IntereStyle achieves both lower distortion and higher perceptual quality compared to the existing state-of-the-art encoders. Especially, our model robustly conserves features of the original images, which shows the robust image editing and style mixing results. We will release our code with the pre-trained model after the review.

updated: Tue Nov 15 2022 04:31:11 GMT+0000 (UTC)

published: Thu Sep 22 2022 06:31:07 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト