Style-Restricted GAN: Multi-Modal Translation with Style Restriction Using Generative Adversarial Networks

Sho Inoue; Tad Gonsalves

スタイル制限付きGAN：生成的敵対的ネットワークを使用したスタイル制限付きマルチモーダル変換

Generative Adversarial Networks（GAN）を使用した対になっていない画像から画像への変換は、複数のドメイン間で画像を変換することに成功しています。さらに、最近の研究では、発電機の出力を多様化する方法が示されています。ただし、ジェネレーターが結果を多様化する方法に制限がないため、いくつかの予期しない機能を変換する可能性があります。この論文では、スタイル制限GAN（SRGAN）を提案して、スタイル多様化プロセスで使用されるエンコードされた機能を制御することの重要性を示します。より具体的には、KL発散損失の代わりに、エンコードされた特徴の分布を制限するために、バッチKL発散損失、相関損失、およびヒストグラム模倣損失の3つの新しい損失を採用します。さらに、エンコーダは、翻訳プロセスで使用される前に、分類タスクで事前にトレーニングされています。この調査では、適合率、再現率、密度、およびカバレッジに関する定量的および定性的な結果が報告されています。提案された3つの損失は、従来のKL損失と比較して多様性のレベルの向上につながります。特に、SRGANは、CelebA顔データセットのクラスに関係のない機能を変更することなく、より多様性の高い翻訳に成功していることがわかりました。結論として、エンコードされた機能が適切に規制されていることの重要性は、2つの実験で証明されました。私たちの実装はhttps://github.com/shinshoji01/Style-Restricted_GANで入手できます。

Unpaired image-to-image translation using Generative Adversarial Networks (GAN) is successful in converting images among multiple domains. Moreover, recent studies have shown a way to diversify the outputs of the generator. However, since there are no restrictions on how the generator diversifies the results, it is likely to translate some unexpected features. In this paper, we propose Style-Restricted GAN (SRGAN) to demonstrate the importance of controlling the encoded features used in style diversifying process. More specifically, instead of KL divergence loss, we adopt three new losses to restrict the distribution of the encoded features: batch KL divergence loss, correlation loss, and histogram imitation loss. Further, the encoder is pre-trained with classification tasks before being used in translation process. The study reports quantitative as well as qualitative results with Precision, Recall, Density, and Coverage. The proposed three losses lead to the enhancement of the level of diversity compared to the conventional KL loss. In particular, SRGAN is found to be successful in translating with higher diversity and without changing the class-unrelated features in the CelebA face dataset. To conclude, the importance of the encoded features being well-regulated was proven with two experiments. Our implementation is available at https://github.com/shinshoji01/Style-Restricted_GAN.

updated: Wed Jul 14 2021 06:57:08 GMT+0000 (UTC)

published: Mon May 17 2021 05:58:33 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト