Improving Diversity in Zero-Shot GAN Adaptation with Semantic Variations

Seogkyu Jeon; Bei Liu; Pilhyeon Lee; Kibeom Hong; Jianlong Fu; Hyeran Byun

セマンティックバリエーションによるゼロショット GAN 適応の多様性の向上

深い生成モデルのトレーニングには通常、大量のデータが必要です。データ収集コストを軽減するために、ゼロショット GAN 適応のタスクは、十分にトレーニングされたジェネレーターを再利用して、追加のトレーニングサンプルを使用せずに、目に見えないターゲットドメインの画像を合成することを目的としています。データが存在しないため、ターゲットドメインのテキスト記述と視覚言語モデル (CLIP など) が、ジェネレータを効果的にガイドするために利用されます。ただし、実際の画像ではなく代表的なテキスト特徴が 1 つだけあると、モデルが最適化されるにつれて合成画像は徐々に多様性を失います。これはモード崩壊とも呼ばれます。この問題に取り組むために、CLIP 空間内のターゲットテキストの意味上のバリエーションを見つける新しい方法を提案します。具体的には、意味情報の制御されない逸脱を正規化しながら、ターゲットドメインの有益なテキストの特徴に基づいて多様な意味のバリエーションを探索します。得られたバリエーションを使用して、画像とテキストの方向分布の 1 次モーメントと 2 次モーメントに一致する新しい方向モーメント損失を設計します。さらに、ソースドメインからの貴重なコンテンツ情報 (外観など) を効果的に保存するために、弾性的な重みの統合と関係の一貫性の損失を導入します。広範な実験を通じて、ゼロショット GAN 適応のさまざまなシナリオでサンプルの多様性を確保する際の提案された方法の有効性を実証します。また、提案された各コンポーネントの効果を検証するためにアブレーション研究も実施します。特に、私たちのモデルは、多様性と品質の両方の点で、ゼロショット GAN 適応に関して新しい最先端を実現しています。

Training deep generative models usually requires a large amount of data. To alleviate the data collection cost, the task of zero-shot GAN adaptation aims to reuse well-trained generators to synthesize images of an unseen target domain without any further training samples. Due to the data absence, the textual description of the target domain and the vision-language models, e.g., CLIP, are utilized to effectively guide the generator. However, with only a single representative text feature instead of real images, the synthesized images gradually lose diversity as the model is optimized, which is also known as mode collapse. To tackle the problem, we propose a novel method to find semantic variations of the target text in the CLIP space. Specifically, we explore diverse semantic variations based on the informative text feature of the target domain while regularizing the uncontrolled deviation of the semantic information. With the obtained variations, we design a novel directional moment loss that matches the first and second moments of image and text direction distributions. Moreover, we introduce elastic weight consolidation and a relation consistency loss to effectively preserve valuable content information from the source domain, e.g., appearances. Through extensive experiments, we demonstrate the efficacy of the proposed methods in ensuring sample diversity in various scenarios of zero-shot GAN adaptation. We also conduct ablation studies to validate the effect of each proposed component. Notably, our model achieves a new state-of-the-art on zero-shot GAN adaptation in terms of both diversity and quality.

updated: Mon Aug 21 2023 08:12:28 GMT+0000 (UTC)

published: Mon Aug 21 2023 08:12:28 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト