DDLP: Unsupervised Object-Centric Video Prediction with Deep Dynamic Latent Particles

Tal Daniel; Aviv Tamar

DDLP: ディープダイナミック潜在粒子を使用した教師なしオブジェクト中心のビデオ予測

我々は、ディープ潜在粒子（DLP）表現に基づいた新しいオブジェクト中心のビデオ予測アルゴリズムを提案します。既存のスロットベースまたはパッチベースの表現と比較して、DLP は、位置やサイズなどのプロパティの学習済みパラメーターを含む一連のキーポイントを使用してシーンをモデル化し、効率的かつ解釈可能です。私たちの手法であるディープダイナミック潜在粒子（DDLP）は、いくつかの困難なデータセットに対して最先端のオブジェクト中心のビデオ予測結果をもたらします。 DDLP の解釈可能な性質により、「what-if」生成、つまり初期フレーム内のオブジェクトのプロパティ変更の結果を予測することが可能になり、DLP のコンパクトな構造により効率的な拡散ベースの無条件ビデオ生成が可能になります。ビデオ、コード、事前トレーニング済みモデルが利用可能です: https://taldatech.github.io/ddlp-web

We propose a new object-centric video prediction algorithm based on the deep latent particle (DLP) representation. In comparison to existing slot- or patch-based representations, DLPs model the scene using a set of keypoints with learned parameters for properties such as position and size, and are both efficient and interpretable. Our method, deep dynamic latent particles (DDLP), yields state-of-the-art object-centric video prediction results on several challenging datasets. The interpretable nature of DDLP allows us to perform ``what-if'' generation -- predict the consequence of changing properties of objects in the initial frames, and DLP's compact structure enables efficient diffusion-based unconditional video generation. Videos, code and pre-trained models are available: https://taldatech.github.io/ddlp-web

updated: Thu Feb 08 2024 14:54:53 GMT+0000 (UTC)

published: Fri Jun 09 2023 15:17:13 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト