Rethinking conditional GAN training: An approach using geometrically structured latent manifolds

Sameera Ramasinghe; Moshiur Farazi; Salman Khan; Nick Barnes; Stephen Gould

条件付きGANトレーニングの再考: 幾何学的に構造化された潜在多様体を使用したアプローチ

条件付きGAN (cGAN) は、その基本的な形式では、生成される出力の多様性の欠如や潜在的多様体と出力多様体の間の歪みなどの重大な欠点に悩まされます。結果を改善するための努力が払われてきましたが、潜在空間と出力空間の間のトポロジーの不一致などの不快な副作用に苦しむ可能性があります。対照的に、私たちは幾何学的な観点からこの問題に取り組み、潜在的多様体と出力多様体の間のバイリプシッツマッピングを体系的に促進することにより、バニラ cGAN の多様性と視覚的品質の両方を向上させる新しいトレーニングメカニズムを提案します。多様性に欠けるベースライン cGAN (つまり、Pix2Pix) でのソリューションの有効性を検証し、トレーニングメカニズム (つまり、提案された Pix2Pix-Geo を使用) を変更するだけで、より多様で現実的な出力をコンピューター上で達成できることを示します。画像から画像への変換タスクの広範なセット。コードは https://github.com/samgregoost/Rethinking-CGANs で入手できます。

Conditional GANs (cGAN), in their rudimentary form, suffer from critical drawbacks such as the lack of diversity in generated outputs and distortion between the latent and output manifolds. Although efforts have been made to improve results, they can suffer from unpleasant side-effects such as the topology mismatch between latent and output spaces. In contrast, we tackle this problem from a geometrical perspective and propose a novel training mechanism that increases both the diversity and the visual quality of a vanilla cGAN, by systematically encouraging a bi-lipschitz mapping between the latent and the output manifolds. We validate the efficacy of our solution on a baseline cGAN (i.e., Pix2Pix) which lacks diversity, and show that by only modifying its training mechanism (i.e., with our proposed Pix2Pix-Geo), one can achieve more diverse and realistic outputs on a broad set of image-to-image translation tasks. Codes are available at https://github.com/samgregoost/Rethinking-CGANs.

updated: Wed Jun 02 2021 11:50:46 GMT+0000 (UTC)

published: Wed Nov 25 2020 22:54:11 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト