Adding Conditional Control to Text-to-Image Diffusion Models

Lvmin Zhang; Maneesh Agrawala

テキストから画像への拡散モデルへの条件付き制御の追加

追加の入力条件をサポートするために、事前トレーニングされた大規模な拡散モデルを制御するためのニューラルネットワーク構造 ControlNet を提示します。 ControlNet はタスク固有の条件をエンドツーエンドの方法で学習し、トレーニングデータセットが小さい (< 50k) 場合でも学習は堅牢です。さらに、ControlNet のトレーニングは、拡散モデルの微調整と同じくらい高速であり、モデルは個人のデバイスでトレーニングできます。あるいは、強力な計算クラスターが利用可能な場合、モデルは大量 (数百万から数十億) のデータにスケーリングできます。 Stable Diffusion のような大規模な拡散モデルを ControlNet で拡張して、エッジマップ、セグメンテーションマップ、キーポイントなどの条件付き入力を有効にできることを報告します。これにより、大規模な拡散モデルを制御する方法が充実し、関連するアプリケーションがさらに容易になる可能性があります。

We present a neural network structure, ControlNet, to control pretrained large diffusion models to support additional input conditions. The ControlNet learns task-specific conditions in an end-to-end way, and the learning is robust even when the training dataset is small (< 50k). Moreover, training a ControlNet is as fast as fine-tuning a diffusion model, and the model can be trained on a personal devices. Alternatively, if powerful computation clusters are available, the model can scale to large amounts (millions to billions) of data. We report that large diffusion models like Stable Diffusion can be augmented with ControlNets to enable conditional inputs like edge maps, segmentation maps, keypoints, etc. This may enrich the methods to control large diffusion models and further facilitate related applications.

updated: Fri Feb 10 2023 23:12:37 GMT+0000 (UTC)

published: Fri Feb 10 2023 23:12:37 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト