Compositional Video Synthesis with Action Graphs

Amir Bar; Roei Herzig; Xiaolong Wang; Anna Rohrbach; Gal Chechik; Trevor Darrell; Amir Globerson

アクショングラフを使用した構図ビデオ合成

アクションのビデオは、空間と時間の豊富な構成構造を含む複雑な信号です。現在のビデオ生成方法には、複数の調整された、場合によっては同時のタイミングアクションで生成を条件付ける機能がありません。この課題に対処するために、アクショングラフと呼ばれるグラフ構造でアクションを表現し、新しい「Action GraphToVideo」合成タスクを提示することを提案します。このタスクの生成モデル（AG2Vid）は、モーションと外観の機能を解きほぐし、アクションのスケジューリングメカニズムを組み込むことで、タイムリーで調整されたビデオ生成を容易にします。 CATERおよびSomething-SomethingV2データセットでAG2Vidをトレーニングおよび評価し、結果のビデオの視覚的品質とセマンティックの一貫性がベースラインと比較して優れていることを示します。最後に、私たちのモデルは、学習したアクションの新しい構成を合成することにより、ゼロショット能力を示しています。コードと事前トレーニング済みモデルについては、プロジェクトページhttps://roeiherz.github.io/AG2Videoを参照してください。

Videos of actions are complex signals containing rich compositional structure in space and time. Current video generation methods lack the ability to condition the generation on multiple coordinated and potentially simultaneous timed actions. To address this challenge, we propose to represent the actions in a graph structure called Action Graph and present the new ``Action Graph To Video'' synthesis task. Our generative model for this task (AG2Vid) disentangles motion and appearance features, and by incorporating a scheduling mechanism for actions facilitates a timely and coordinated video generation. We train and evaluate AG2Vid on the CATER and Something-Something V2 datasets, and show that the resulting videos have better visual quality and semantic consistency compared to baselines. Finally, our model demonstrates zero-shot abilities by synthesizing novel compositions of the learned actions. For code and pretrained models, see the project page https://roeiherz.github.io/AG2Video

updated: Thu Jun 10 2021 21:07:15 GMT+0000 (UTC)

published: Sat Jun 27 2020 09:39:04 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト