On Conditioning the Input Noise for Controlled Image Generation with Diffusion Models

Vedant Singh; Surgan Jandial; Ayush Chopra; Siddharth Ramesh; Balaji Krishnamurthy; Vineeth N. Balasubramanian

拡散モデルによる制御された画像生成のための入力ノイズの調整について

条件付き画像生成は、画像編集、ストックフォトの生成、および3Dオブジェクト生成におけるいくつかのブレークスルーへの道を開きました。これは、拡散モデルに基づく新しい最先端の方法の台頭に伴い、引き続き重要な関心領域です。ただし、拡散モデルでは生成された画像をほとんど制御できないため、分類器ガイダンスなどの手法を探求する後続の作業につながり、多様性と忠実度をトレードオフする方法が提供されます。この作業では、慎重に作成された入力ノイズアーティファクトを使用して拡散モデルを調整する手法を検討します。これにより、セマンティック属性を条件とする画像の生成が可能になります。これは、ガウスノイズを入力し、拡散モデルの推論ステップでさらに条件付けを導入する既存のアプローチとは異なります。いくつかの例と条件付き設定での実験は、私たちのアプローチの可能性を示しています。

Conditional image generation has paved the way for several breakthroughs in image editing, generating stock photos and 3-D object generation. This continues to be a significant area of interest with the rise of new state-of-the-art methods that are based on diffusion models. However, diffusion models provide very little control over the generated image, which led to subsequent works exploring techniques like classifier guidance, that provides a way to trade off diversity with fidelity. In this work, we explore techniques to condition diffusion models with carefully crafted input noise artifacts. This allows generation of images conditioned on semantic attributes. This is different from existing approaches that input Gaussian noise and further introduce conditioning at the diffusion model's inference step. Our experiments over several examples and conditional settings show the potential of our approach.

updated: Sun May 08 2022 13:18:14 GMT+0000 (UTC)

published: Sun May 08 2022 13:18:14 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト