PRedItOR: Text Guided Image Editing with Diffusion Prior

Hareesh Ravi; Sachin Kelkar; Midhun Harikumar; Ajinkya Kale

PRedItOR: Diffusion Prior を使用したテキストガイド付き画像編集

拡散モデルは、テキストに条件付けられた高品質で創造的な画像を生成する際に、驚くべき能力を発揮してきました。このようなモデルの興味深いアプリケーションは、構造を維持するテキストガイド付き画像編集です。既存のアプローチは、Stable Diffusion や Imagen などのテキスト調整拡散モデルに依存しており、テキスト埋め込みの集中的な最適化や、テキストガイド付き画像編集用のモデルの重みの微調整を計算する必要があります。 DALLE-2 に似たハイブリッド拡散モデル (HDM) アーキテクチャを使用したテキストガイド付き画像編集について説明します。当社のアーキテクチャは、テキストプロンプトを条件とする CLIP 画像埋め込みを生成する拡散優先モデルと、CLIP 画像埋め込みを条件とする画像を生成するようにトレーニングされたカスタム潜在拡散モデルで構成されます。拡散事前モデルを使用して、微調整や最適化を行わずに、CLIP 画像埋め込みスペースでテキストガイド付きの概念編集を実行できることを発見しました。これを、リバース DDIM などの既存のアプローチを使用して、テキストガイド付き画像編集を実行する画像デコーダの編集を維持する構造と組み合わせます。私たちのアプローチである PRedItOR は、追加の入力、微調整、最適化、または目標を必要とせず、質的および量的にベースラインと同等またはそれ以上の結果を示します。私たちは、拡散前モデルのさらなる分析と理解を提供し、これが拡散モデル研究の新たな可能性を開くと信じています。

Diffusion models have shown remarkable capabilities in generating high quality and creative images conditioned on text. An interesting application of such models is structure preserving text guided image editing. Existing approaches rely on text conditioned diffusion models such as Stable Diffusion or Imagen and require compute intensive optimization of text embeddings or fine-tuning the model weights for text guided image editing. We explore text guided image editing with a Hybrid Diffusion Model (HDM) architecture similar to DALLE-2. Our architecture consists of a diffusion prior model that generates CLIP image embedding conditioned on a text prompt and a custom Latent Diffusion Model trained to generate images conditioned on CLIP image embedding. We discover that the diffusion prior model can be used to perform text guided conceptual edits on the CLIP image embedding space without any finetuning or optimization. We combine this with structure preserving edits on the image decoder using existing approaches such as reverse DDIM to perform text guided image editing. Our approach, PRedItOR does not require additional inputs, fine-tuning, optimization or objectives and shows on par or better results than baselines qualitatively and quantitatively. We provide further analysis and understanding of the diffusion prior model and believe this opens up new possibilities in diffusion models research.

updated: Mon Mar 20 2023 22:00:42 GMT+0000 (UTC)

published: Wed Feb 15 2023 22:58:11 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト