DreamPose: Fashion Image-to-Video Synthesis via Stable Diffusion

Johanna Karras; Aleksander Holynski; Ting-Chun Wang; Ira Kemelmacher-Shlizerman

DreamPose: 安定拡散によるファッション画像からビデオへの合成

静止画像からアニメーション化されたファッションビデオを生成するための拡散ベースの方法である DreamPose を紹介します。画像と一連の人体ポーズが与えられると、この方法は、人間と布地の両方の動きを含むビデオを合成します。これを達成するために、新しい微調整戦略、追加されたコンディショニング信号をサポートするための一連のアーキテクチャ変更、および手法を使用して、事前トレーニング済みのテキストから画像へのモデル (Stable Diffusion) をポーズおよび画像誘導ビデオ合成モデルに変換します。一時的な一貫性を促進します。 UBC Fashion データセットからのファッションビデオのコレクションを微調整します。さまざまな衣服スタイルとポーズでこの方法を評価し、ファッションビデオアニメーションで最先端の結果が得られることを示します。動画の結果は、プロジェクトページでご覧いただけます。

We present DreamPose, a diffusion-based method for generating animated fashion videos from still images. Given an image and a sequence of human body poses, our method synthesizes a video containing both human and fabric motion. To achieve this, we transform a pretrained text-to-image model (Stable Diffusion) into a pose-and-image guided video synthesis model, using a novel finetuning strategy, a set of architectural changes to support the added conditioning signals, and techniques to encourage temporal consistency. We fine-tune on a collection of fashion videos from the UBC Fashion dataset. We evaluate our method on a variety of clothing styles and poses, and demonstrate that our method produces state-of-the-art results on fashion video animation. Video results are available on our project page.

updated: Thu May 04 2023 22:29:51 GMT+0000 (UTC)

published: Wed Apr 12 2023 17:59:17 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト