Diverse Video Generation using a Gaussian Process Trigger

Gaurav Shrivastava; Abhinav Shrivastava

ガウス過程トリガーを使用した多様なビデオ生成

いくつかのコンテキスト（または過去）フレームを指定して将来のフレームを生成することは、困難な作業です。それには、潜在的な将来の状態における多様性の観点から、ビデオの時間的コヒーレンスとマルチモダリティをモデル化する必要があります。ビデオ生成のための現在の変分アプローチは、マルチモーダルの将来の結果を無視する傾向があります。代わりに、将来の結果におけるマルチモダリティを明示的にモデル化し、それを活用して多様な未来をサンプリングすることを提案します。私たちのアプローチであるDiverseVideo Generatorは、ガウス過程（GP）を使用して、過去に与えられた将来の状態の事前確率を学習し、特定のサンプルに与えられた可能な未来の確率分布を維持します。さらに、この分布の経時変化を活用して、進行中のシーケンスの終了を推定することにより、さまざまな将来の状態のサンプリングを制御します。つまり、出力関数空間でのGPの分散を使用して、アクションシーケンスの変更をトリガーします。再構成の品質と生成されたシーケンスの多様性の観点から、多様な将来のフレーム生成に関する最先端の結果を実現します。

Generating future frames given a few context (or past) frames is a challenging task. It requires modeling the temporal coherence of videos and multi-modality in terms of diversity in the potential future states. Current variational approaches for video generation tend to marginalize over multi-modal future outcomes. Instead, we propose to explicitly model the multi-modality in the future outcomes and leverage it to sample diverse futures. Our approach, Diverse Video Generator, uses a Gaussian Process (GP) to learn priors on future states given the past and maintains a probability distribution over possible futures given a particular sample. In addition, we leverage the changes in this distribution over time to control the sampling of diverse future states by estimating the end of ongoing sequences. That is, we use the variance of GP over the output function space to trigger a change in an action sequence. We achieve state-of-the-art results on diverse future frame generation in terms of reconstruction quality and diversity of the generated sequences.

updated: Fri Jul 09 2021 18:15:16 GMT+0000 (UTC)

published: Fri Jul 09 2021 18:15:16 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト