HumanSD: A Native Skeleton-Guided Diffusion Model for Human Image Generation

Xuan Ju; Ailing Zeng; Chenchen Zhao; Jianan Wang; Lei Zhang; Qiang Xu

HumanSD: 人間の画像生成のためのネイティブスケルトンガイド拡散モデル

制御可能な人物画像生成 (HIG) には、数多くの実際の用途があります。 ControlNet や T2I-Adapter などの最先端のソリューションは、HIG のスケルトンガイダンスを含むさまざまな条件を適用できる、凍結された事前トレーニング済みの安定拡散 (SD) モデルの上に追加の学習可能なブランチを導入します。このようなプラグアンドプレイのアプローチは魅力的ですが、凍結された SD ブランチから生成された元の画像と特定の条件との間の避けられない不確実な競合は、基本的に条件強制のための画像特徴編集を行う学習可能なブランチに重大な課題をもたらします。この作業では、HumanSD と呼ばれる制御可能な HIG のネイティブスケルトン誘導拡散モデルを提案します。デュアルブランチ拡散で画像編集を実行する代わりに、新しいヒートマップガイドによるノイズ除去損失を使用して元の SD モデルを微調整します。この戦略は、壊滅的な忘却効果を軽減しながら、モデルのトレーニング中に特定の骨格状態を効果的かつ効率的に強化します。 HumanSD は、テキストイメージポーズ情報を含む 3 つの大規模な人間中心のデータセットのアセンブリで微調整されており、そのうちの 2 つがこの作業で確立されています。図 1 に示すように、HumanSD は正確な姿勢制御と画質の点で ControlNet よりも優れており、特に特定のスケルトンガイダンスが高度な場合に顕著です。

Controllable human image generation (HIG) has numerous real-life applications. State-of-the-art solutions, such as ControlNet and T2I-Adapter, introduce an additional learnable branch on top of the frozen pre-trained stable diffusion (SD) model, which can enforce various conditions, including skeleton guidance of HIG. While such a plug-and-play approach is appealing, the inevitable and uncertain conflicts between the original images produced from the frozen SD branch and the given condition incur significant challenges for the learnable branch, which essentially conducts image feature editing for condition enforcement. In this work, we propose a native skeleton-guided diffusion model for controllable HIG called HumanSD. Instead of performing image editing with dual-branch diffusion, we fine-tune the original SD model using a novel heatmap-guided denoising loss. This strategy effectively and efficiently strengthens the given skeleton condition during model training while mitigating the catastrophic forgetting effects. HumanSD is fine-tuned on the assembly of three large-scale human-centric datasets with text-image-pose information, two of which are established in this work. As shown in Figure 1, HumanSD outperforms ControlNet in terms of accurate pose control and image quality, particularly when the given skeleton guidance is sophisticated.

updated: Sun Apr 09 2023 16:21:43 GMT+0000 (UTC)

published: Sun Apr 09 2023 16:21:43 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト