Multimodal Conditional Image Synthesis with Product-of-Experts GANs

Xun Huang; Arun Mallya; Ting-Chun Wang; Ming-Yu Liu

専門家の製品GANを使用したマルチモーダル条件付き画像合成

既存の条件付き画像合成フレームワークは、テキスト、セグメンテーション、スケッチ、スタイル参照などの単一のモダリティでのユーザー入力に基づいて画像を生成します。多くの場合、利用可能な場合はマルチモーダルユーザー入力を活用できないため、実用性が低下します。この制限に対処するために、複数の入力モダリティまたはそれらのサブセット（空のセットも含む）を条件とする画像を合成できる、Product-of-Experts Generative Adversarial Networks（PoE-GAN）フレームワークを提案します。 PoE-GANは、エキスパート製品ジェネレーターとマルチモーダルマルチスケールプロジェクションディスクリミネーターで構成されています。慎重に設計されたトレーニングスキームを通じて、PoE-GANは高品質で多様性のある画像を合成することを学びます。マルチモーダル条件付き画像合成の最先端に加えて、PoE-GANは、ユニモーダル設定でテストした場合、既存の最高のユニモーダル条件付き画像合成アプローチよりも優れています。プロジェクトのWebサイトは、https：//deepimagination.github.io/PoE-GANで入手できます。

Existing conditional image synthesis frameworks generate images based on user inputs in a single modality, such as text, segmentation, sketch, or style reference. They are often unable to leverage multimodal user inputs when available, which reduces their practicality. To address this limitation, we propose the Product-of-Experts Generative Adversarial Networks (PoE-GAN) framework, which can synthesize images conditioned on multiple input modalities or any subset of them, even the empty set. PoE-GAN consists of a product-of-experts generator and a multimodal multiscale projection discriminator. Through our carefully designed training scheme, PoE-GAN learns to synthesize images with high quality and diversity. Besides advancing the state of the art in multimodal conditional image synthesis, PoE-GAN also outperforms the best existing unimodal conditional image synthesis approaches when tested in the unimodal setting. The project website is available at https://deepimagination.github.io/PoE-GAN .

updated: Thu Dec 09 2021 18:59:00 GMT+0000 (UTC)

published: Thu Dec 09 2021 18:59:00 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト