SINE: SINgle Image Editing with Text-to-Image Diffusion Models

Zhixing Zhang; Ligong Han; Arnab Ghosh; Dimitris Metaxas; Jian Ren

SINE: テキストから画像への拡散モデルによる単一画像編集

拡散モデルに関する最近の研究では、テキストガイドによる画像合成など、画像生成を調整するための強力な機能が実証されています。このような成功は、実際の画像編集という困難な問題に取り組むために、事前にトレーニングされた大規模な拡散モデルを使用しようとする多くの取り組みを刺激します。この領域で行われる作業は、同じオブジェクトを含む複数の画像に対応する一意のテキストトークンを学習します。ただし、多くの場合、真珠の耳飾りの少女の絵など、1 つの画像しか利用できません。単一の画像を使用して事前トレーニング済みの拡散モデルを微調整する既存の作業を使用すると、深刻なオーバーフィッティングの問題が発生します。事前に訓練された拡散モデルからの情報漏えいにより、編集は、言語ガイダンスによって描かれた新しい特徴を作成しながら、与えられた画像と同じ内容を維持することができなくなります。この作品は、単一画像編集の問題に対処することを目的としています。分類子を使用しないガイダンスに基づいて構築された新しいモデルベースのガイダンスを提案します。これにより、単一の画像でトレーニングされたモデルからの知識が事前トレーニング済みの拡散モデルに抽出され、特定の 1 つの画像でもコンテンツの作成が可能になります。さらに、モデルが任意の解像度の画像を生成するのに効果的に役立つパッチベースの微調整を提案します。私たちのアプローチの設計上の選択を検証し、スタイルの変更、コンテンツの追加、オブジェクト操作などの有望な編集機能を示すために、広範な実験を行います。このコードは、https://github.com/zhang-zx/SINE.git で研究目的で利用できます。

Recent works on diffusion models have demonstrated a strong capability for conditioning image generation, e.g., text-guided image synthesis. Such success inspires many efforts trying to use large-scale pre-trained diffusion models for tackling a challenging problem--real image editing. Works conducted in this area learn a unique textual token corresponding to several images containing the same object. However, under many circumstances, only one image is available, such as the painting of the Girl with a Pearl Earring. Using existing works on fine-tuning the pre-trained diffusion models with a single image causes severe overfitting issues. The information leakage from the pre-trained diffusion models makes editing can not keep the same content as the given image while creating new features depicted by the language guidance. This work aims to address the problem of single-image editing. We propose a novel model-based guidance built upon the classifier-free guidance so that the knowledge from the model trained on a single image can be distilled into the pre-trained diffusion model, enabling content creation even with one given image. Additionally, we propose a patch-based fine-tuning that can effectively help the model generate images of arbitrary resolution. We provide extensive experiments to validate the design choices of our approach and show promising editing capabilities, including changing style, content addition, and object manipulation. The code is available for research purposes at https://github.com/zhang-zx/SINE.git .

updated: Thu Dec 08 2022 18:57:13 GMT+0000 (UTC)

published: Thu Dec 08 2022 18:57:13 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト