SinFusion: Training Diffusion Models on a Single Image or Video

Yaniv Nikankin; Niv Haim; Michal Irani

SinFusion: 単一の画像またはビデオでの拡散モデルのトレーニング

拡散モデルは、画像とビデオの生成において驚異的な進歩を示し、品質と多様性においてGANを上回りました。ただし、それらは通常、非常に大きなデータセットでトレーニングされており、特定の入力画像やビデオを操作するようには自然に適応されていません。このホワイトペーパーでは、単一の入力画像またはビデオで拡散モデルをトレーニングすることにより、これを解決する方法を示します。当社の画像/ビデオ固有の拡散モデル (SinFusion) は、拡散モデルの条件付け機能を利用しながら、単一の画像またはビデオの外観とダイナミクスを学習します。幅広い画像/ビデオ固有の操作タスクを解決できます。特に、私たちのモデルは、単一の入力ビデオのモーションとダイナミクスをいくつかのフレームから学習できます。次に、同じ動的シーンのさまざまな新しいビデオサンプルを生成し、短いビデオを長いビデオに外挿し (時間的に順方向と逆方向の両方)、ビデオのアップサンプリングを実行できます。これらのタスクのほとんどは、現在のビデオ固有の生成方法では実現できません。

Diffusion models exhibited tremendous progress in image and video generation, exceeding GANs in quality and diversity. However, they are usually trained on very large datasets and are not naturally adapted to manipulate a given input image or video. In this paper we show how this can be resolved by training a diffusion model on a single input image or video. Our image/video-specific diffusion model (SinFusion) learns the appearance and dynamics of the single image or video, while utilizing the conditioning capabilities of diffusion models. It can solve a wide array of image/video-specific manipulation tasks. In particular, our model can learn from few frames the motion and dynamics of a single input video. It can then generate diverse new video samples of the same dynamic scene, extrapolate short videos into long ones (both forward and backward in time) and perform video upsampling. Most of these tasks are not realizable by current video-specific generation methods.

updated: Sun Mar 12 2023 13:21:28 GMT+0000 (UTC)

published: Mon Nov 21 2022 18:59:33 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト