Hierarchical Video Generation for Complex Data

Lluis Castrejon; Nicolas Ballas; Aaron Courville

複雑なデータの階層型ビデオ生成

ビデオは、多くの場合、最初にシーンの全体的な説明の概要を説明し、次にローカルの詳細を追加することで作成できます。これに触発されて、粗いアプローチから細かいアプローチに続くビデオ生成の階層モデルを提案します。最初に、私たちのモデルは低解像度のビデオを生成し、グローバルシーン構造を確立します。次に、階層の後続のレベルによって洗練されます。ビデオの部分的なビューに基づいて、階層の各レベルを順番にトレーニングします。これにより、数フレームを超える高解像度ビデオにスケーリングする生成モデルの計算の複雑さが軽減されます。 Kinetics-600 と BDD100K でアプローチを検証し、48 フレームで 256x256 のビデオを生成できる 3 レベルモデルをトレーニングします。

Videos can often be created by first outlining a global description of the scene and then adding local details. Inspired by this we propose a hierarchical model for video generation which follows a coarse to fine approach. First our model generates a low resolution video, establishing the global scene structure, that is then refined by subsequent levels in the hierarchy. We train each level in our hierarchy sequentially on partial views of the videos. This reduces the computational complexity of our generative model, which scales to high-resolution videos beyond a few frames. We validate our approach on Kinetics-600 and BDD100K, for which we train a three level model capable of generating 256x256 videos with 48 frames.

updated: Fri Jun 04 2021 21:03:52 GMT+0000 (UTC)

published: Fri Jun 04 2021 21:03:52 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト