Fine-grained Image Editing by Pixel-wise Guidance Using Diffusion Models

Naoki Matsunaga; Masato Ishii; Akio Hayakawa; Kenji Suzuki; Takuya Narihira

拡散モデルを使用したピクセル単位のガイダンスによるきめ細かい画像編集

生成モデル、特に GAN は、画像編集に利用されています。 GAN ベースの方法は、ユーザーの意図に沿った妥当なコンテンツを生成する上ではうまく機能しますが、編集領域外のコンテンツを厳密に保存するのは困難です。この問題に対処するために、GAN の代わりに拡散モデルを使用し、ピクセル単位のガイダンスに基づく新しい画像編集方法を提案します。具体的には、最初に少数の注釈付きデータを使用してピクセル分類子をトレーニングし、次にターゲット画像のセマンティックセグメンテーションマップを推定します。次に、ユーザーはマップを操作して、画像の編集方法を指示します。拡散モデルは、結果の画像が操作されたマップと一致するように、ピクセル単位の分類子によるガイダンスを介して編集された画像を生成します。ガイダンスはピクセル単位で行われるため、提案手法は、編集領域外のコンテンツを維持しながら、編集領域内で適切なコンテンツを作成できます。実験結果は、提案された方法の利点を定量的および定性的に検証します。

Generative models, particularly GANs, have been utilized for image editing. Although GAN-based methods perform well on generating reasonable contents aligned with the user's intentions, they struggle to strictly preserve the contents outside the editing region. To address this issue, we use diffusion models instead of GANs and propose a novel image-editing method, based on pixel-wise guidance. Specifically, we first train pixel-classifiers with few annotated data and then estimate the semantic segmentation map of a target image. Users then manipulate the map to instruct how the image is to be edited. The diffusion model generates an edited image via guidance by pixel-wise classifiers, such that the resultant image aligns with the manipulated map. As the guidance is conducted pixel-wise, the proposed method can create reasonable contents in the editing region while preserving the contents outside this region. The experimental results validate the advantages of the proposed method both quantitatively and qualitatively.

updated: Mon Dec 05 2022 04:39:08 GMT+0000 (UTC)

published: Mon Dec 05 2022 04:39:08 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト