UniTune: Text-Driven Image Editing by Fine Tuning an Image Generation Model on a Single Image

Dani Valevski; Matan Kalman; Yossi Matias; Yaniv Leviathan

UniTune: 単一の画像で画像生成モデルを微調整することによるテキスト主導の画像編集

一般的なテキスト駆動型の画像編集のためのシンプルで斬新な方法である UniTune を紹介します。 UniTune は、入力として任意の画像とテキストによる編集の説明を取得し、入力画像に対する高いセマンティックおよび視覚的忠実度を維持しながら編集を実行します。 UniTune は、アートディレクション用の直感的なインターフェイスであるテキストを使用し、マスクやスケッチなどの追加入力を必要としません。私たちの方法の核心は、パラメーターを正しく選択することで、単一の画像で大規模なテキストから画像への拡散モデルを微調整し、モデルが入力画像への忠実度を維持しながら表現力を維持できるという観察です。操作。テキストから画像へのモデルとして Imagen を使用しましたが、UniTune は他の大規模モデルでも動作することを期待しています。さまざまなユースケースでメソッドをテストし、その幅広い適用性を示します。

We present UniTune, a simple and novel method for general text-driven image editing. UniTune gets as input an arbitrary image and a textual edit description, and carries out the edit while maintaining high semantic and visual fidelity to the input image. UniTune uses text, an intuitive interface for art-direction, and does not require additional inputs, like masks or sketches. At the core of our method is the observation that with the right choice of parameters, we can fine-tune a large text-to-image diffusion model on a single image, encouraging the model to maintain fidelity to the input image while still allowing expressive manipulations. We used Imagen as our text-to-image model, but we expect UniTune to work with other large-scale models as well. We test our method in a range of different use cases, and demonstrate its wide applicability.

updated: Wed Oct 19 2022 17:35:46 GMT+0000 (UTC)

published: Mon Oct 17 2022 23:46:05 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト