Tune-A-Video: One-Shot Tuning of Image Diffusion Models for Text-to-Video Generation

Jay Zhangjie Wu; Yixiao Ge; Xintao Wang; Weixian Lei; Yuchao Gu; Yufei Shi; Wynne Hsu; Ying Shan; Xiaohu Qie; Mike Zheng Shou

Tune-A-Video: テキストからビデオへの生成のための画像拡散モデルのワンショット調整

テキストから画像 (T2I) 生成の成功を再現するために、最近の研究では大規模なビデオデータセットを使用してテキストからビデオ (T2V) ジェネレーターをトレーニングしています。有望な結果にもかかわらず、そのようなパラダイムは計算コストが高くなります。この作業では、新しい T2V 生成設定 x2014One-Shot Video Tuning を提案します。ここでは、テキストとビデオのペアが 1 つだけ表示されます。私たちのモデルは、大量の画像データで事前にトレーニングされた最先端の T2I 拡散モデルに基づいて構築されています。 1) T2I モデルは、動詞を表す静止画像を生成できます。 2) T2I モデルを拡張して複数の画像を同時に生成すると、コンテンツの一貫性が驚くほど良好になります。連続的な動きをさらに学習するために、Tune-A-Video を導入します。これには、調整された時空間注意メカニズムと効率的なワンショットチューニング戦略が含まれます。推論では、DDIM 反転を使用して、サンプリングの構造ガイダンスを提供します。広範な定性的および数値的実験により、さまざまなアプリケーションにわたるこの方法の優れた能力が実証されています。

To replicate the success of text-to-image (T2I) generation, recent works employ large-scale video datasets to train a text-to-video (T2V) generator. Despite their promising results, such paradigm is computationally expensive. In this work, we propose a new T2V generation settingx2014One-Shot Video Tuning, where only one text-video pair is presented. Our model is built on state-of-the-art T2I diffusion models pre-trained on massive image data. We make two key observations: 1) T2I models can generate still images that represent verb terms; 2) extending T2I models to generate multiple images concurrently exhibits surprisingly good content consistency. To further learn continuous motion, we introduce Tune-A-Video, which involves a tailored spatio-temporal attention mechanism and an efficient one-shot tuning strategy. At inference, we employ DDIM inversion to provide structure guidance for sampling. Extensive qualitative and numerical experiments demonstrate the remarkable ability of our method across various applications.

updated: Fri Mar 17 2023 17:28:04 GMT+0000 (UTC)

published: Thu Dec 22 2022 09:43:36 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト