Mix-of-Show: Decentralized Low-Rank Adaptation for Multi-Concept Customization of Diffusion Models

Yuchao Gu; Xintao Wang; Jay Zhangjie Wu; Yujun Shi; Yunpeng Chen; Zihan Fan; Wuyou Xiao; Rui Zhao; Shuning Chang; Weijia Wu; Yixiao Ge; Ying Shan; Mike Zheng Shou

ショーのミックス: 拡散モデルのマルチコンセプトカスタマイズのための分散型低ランク適応

Stable Diffusion などの公開された大規模なテキストから画像への拡散モデルは、コミュニティから大きな注目を集めています。これらのモデルは、低ランク適応 (LoRA) を使用して、新しいコンセプトに合わせて簡単にカスタマイズできます。ただし、複数のカスタマイズされたコンセプトを共同でサポートするために複数のコンセプト LoRA を利用することには課題が伴います。このシナリオを分散型マルチコンセプトカスタマイズと呼びます。これには、単一クライアントのコンセプトの調整とセンターノードのコンセプトの融合が含まれます。このホワイトペーパーでは、既存のシングルクライアント LoRA チューニングやモデル融合中のアイデンティティ損失から生じる概念の競合など、分散型マルチコンセプトのカスタマイズの課題に対処する、Mix-of-Show と呼ばれる新しいフレームワークを提案します。 Mix-of-Show は、単一クライアントのチューニングとセンターノードの勾配融合に埋め込み分解 LoRA (ED-LoRA) を採用し、単一概念のドメイン内本質を保持し、理論的に無限の概念融合をサポートします。さらに、空間的に制御可能なサンプリング (ControlNet や T2I アダプターなど) を拡張して、マルチコンセプトサンプリングにおける属性バインディングやオブジェクト欠落の問題に対処する、地域的に制御可能なサンプリングを導入します。広範な実験により、Mix-of-Show がキャラクター、オブジェクト、シーンなどの複数のカスタマイズされたコンセプトを高い忠実度で構成できることが実証されました。

Public large-scale text-to-image diffusion models, such as Stable Diffusion, have gained significant attention from the community. These models can be easily customized for new concepts using low-rank adaptations (LoRAs). However, the utilization of multiple concept LoRAs to jointly support multiple customized concepts presents a challenge. We refer to this scenario as decentralized multi-concept customization, which involves single-client concept tuning and center-node concept fusion. In this paper, we propose a new framework called Mix-of-Show that addresses the challenges of decentralized multi-concept customization, including concept conflicts resulting from existing single-client LoRA tuning and identity loss during model fusion. Mix-of-Show adopts an embedding-decomposed LoRA (ED-LoRA) for single-client tuning and gradient fusion for the center node to preserve the in-domain essence of single concepts and support theoretically limitless concept fusion. Additionally, we introduce regionally controllable sampling, which extends spatially controllable sampling (e.g., ControlNet and T2I-Adaptor) to address attribute binding and missing object problems in multi-concept sampling. Extensive experiments demonstrate that Mix-of-Show is capable of composing multiple customized concepts with high fidelity, including characters, objects, and scenes.

updated: Mon May 29 2023 17:58:16 GMT+0000 (UTC)

published: Mon May 29 2023 17:58:16 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト