FISEdit: Accelerating Text-to-image Editing via Cache-enabled Sparse Diffusion Inference

Zihao Yu; Haoyang Li; Fangcheng Fu; Xupeng Miao; Bin Cui

FISEdit: キャッシュ対応のスパース拡散推論によるテキストから画像への編集の高速化

最近の普及モデルの成功により、テキストから画像への生成がますます普及しており、幅広い用途が実現しています。中でも、テキストから画像への編集、またはテキストから画像への連続生成は大きな注目を集めており、生成される画像の品質を向上させる可能性があります。ユーザーが、拡散推論を数ラウンド行う際に、入力テキストの説明に若干の変更を加えて、生成された画像を少し編集したいと考えることはよくあります。ただし、このような画像編集プロセスでは、GPU アクセラレータを使用した場合でも、多くの既存の拡散モデルの推論効率が低いという問題があります。この問題を解決するために、テキストから画像への効率的な編集のためのキャッシュ対応のスパース拡散モデル推論エンジンである Fast Image Semantically Edit (FISEdit) を導入します。私たちのアプローチの背後にある重要な直感は、入力テキストの小さな変更と出力画像の影響を受ける領域の間の意味論的なマッピングを利用することです。テキスト編集ステップごとに、FISEdit は影響を受ける画像領域を自動的に識別し、キャッシュされた未変更領域の特徴マップを利用して推論プロセスを加速します。広範な実証結果は、FISEdit が NVIDIA TITAN RTX および A100 GPU 上の既存の方法よりもそれぞれ 3.4 倍および 4.4 倍高速であり、さらに満足のいく画像を生成できることを示しています。

Due to the recent success of diffusion models, text-to-image generation is becoming increasingly popular and achieves a wide range of applications. Among them, text-to-image editing, or continuous text-to-image generation, attracts lots of attention and can potentially improve the quality of generated images. It's common to see that users may want to slightly edit the generated image by making minor modifications to their input textual descriptions for several rounds of diffusion inference. However, such an image editing process suffers from the low inference efficiency of many existing diffusion models even using GPU accelerators. To solve this problem, we introduce Fast Image Semantically Edit (FISEdit), a cached-enabled sparse diffusion model inference engine for efficient text-to-image editing. The key intuition behind our approach is to utilize the semantic mapping between the minor modifications on the input text and the affected regions on the output image. For each text editing step, FISEdit can automatically identify the affected image regions and utilize the cached unchanged regions' feature map to accelerate the inference process. Extensive empirical results show that FISEdit can be 3.4× and 4.4× faster than existing methods on NVIDIA TITAN RTX and A100 GPUs respectively, and even generates more satisfactory images.

updated: Sat May 27 2023 09:14:03 GMT+0000 (UTC)

published: Sat May 27 2023 09:14:03 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト