Simple Video Generation using Neural ODEs

David Kanaa; Vikram Voleti; Samira Ebrahimi Kahou; Christopher Pal

ニューラルODEを使用したシンプルなビデオ生成

かなりの程度研究されてきたにもかかわらず、フレームまたはビデオのシーケンスの条件付き生成のタスクは、非常に困難なままです。この課題を解決するための重要なステップは、ビデオ信号の空間情報と時間情報の両方を正確にモデル化することにあるというのが一般的な信念です。最近の文献で示唆されているように、そうするための有望な方向性は、潜在空間の未来を予測し、ピクセルに投影する潜在変数モデルを学ぶことでした。この一連の作業に続いて、前の作業で導入されたモデルのファミリーであるNeural ODEに基づいて、時間に関する微分方程式を使用して連続潜在空間上の時間連続ダイナミクスをモデル化するアプローチを調査します。このアプローチの背後にある直感は、潜在空間内のこれらの軌道を外挿して、モデルがトレーニングされるタイムステップを超えてビデオフレームを生成できるということです。私たちのアプローチが、1桁と2桁のMovingMNISTデータセットでの将来のフレーム予測のタスクで有望な結果をもたらすことを示します。

Despite having been studied to a great extent, the task of conditional generation of sequences of frames, or videos, remains extremely challenging. It is a common belief that a key step towards solving this task resides in modelling accurately both spatial and temporal information in video signals. A promising direction to do so has been to learn latent variable models that predict the future in latent space and project back to pixels, as suggested in recent literature. Following this line of work and building on top of a family of models introduced in prior work, Neural ODE, we investigate an approach that models time-continuous dynamics over a continuous latent space with a differential equation with respect to time. The intuition behind this approach is that these trajectories in latent space could then be extrapolated to generate video frames beyond the time steps for which the model is trained. We show that our approach yields promising results in the task of future frame prediction on the Moving MNIST dataset with 1 and 2 digits.

updated: Tue Sep 07 2021 19:03:33 GMT+0000 (UTC)

published: Tue Sep 07 2021 19:03:33 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト