Diffusion Probabilistic Modeling for Video Generation

Ruihan Yang; Prakhar Srivastava; Stephan Mandt

ビデオ生成のための拡散確率モデリング

ノイズ除去拡散確率モデルは、知覚メトリックでGANと競合する有望な新しいクラスの生成モデルです。この論文では、ビデオを順次生成する可能性を探ります。ニューラルビデオ圧縮の最近の進歩に触発されて、ノイズ除去拡散モデルを使用して、決定論的な次のフレーム予測の残差を確率的に生成します。このアプローチを、4つのデータセットの2つの連続するVAEおよび2つのGANベースラインと比較します。ここでは、生成されたフレームの知覚品質と、グラウンドトゥルースフレームに対する予測精度をテストします。すべてのデータの知覚品質の点で大幅な改善が見られ、複雑な高解像度ビデオのフレーム予測の点で改善が見られます。

Denoising diffusion probabilistic models are a promising new class of generative models that are competitive with GANs on perceptual metrics. In this paper, we explore their potential for sequentially generating video. Inspired by recent advances in neural video compression, we use denoising diffusion models to stochastically generate a residual to a deterministic next-frame prediction. We compare this approach to two sequential VAE and two GAN baselines on four datasets, where we test the generated frames for perceptual quality and forecasting accuracy against ground truth frames. We find significant improvements in terms of perceptual quality on all data and improvements in terms of frame forecasting for complex high-resolution videos.

updated: Wed Mar 16 2022 03:52:45 GMT+0000 (UTC)

published: Wed Mar 16 2022 03:52:45 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト