Unsupervised Discovery of 3D Physical Objects from Video

Yilun Du; Kevin Smith; Tomer Ulman; Joshua Tenenbaum; Jiajun Wu

ビデオからの3D物理オブジェクトの教師なし発見

教師なし物体発見の問題を研究します。既存のフレームワークは、各オブジェクトの外観に基づいてシーンを2Dセグメントに分解することを目的としていますが、物理学、特にオブジェクトの相互作用が、監視されていない方法で、3Dジオメトリとオブジェクトの位置をビデオから解きほぐす方法を探ります。発達心理学からインスピレーションを得て、私たちの物理オブジェクト発見ネットワーク（POD-Net）は、マルチスケールピクセルキューと物理モーションキューの両方を使用して、さまざまなサイズの観測可能で部分的に遮蔽されたオブジェクトを正確にセグメント化し、それらのオブジェクトのプロパティを推測します。私たちのモデルは、合成シーンと実際のシーンの両方でオブジェクトを確実にセグメント化します。検出されたオブジェクトのプロパティは、物理的なイベントについて推論するためにも使用できます。

We study the problem of unsupervised physical object discovery. While existing frameworks aim to decompose scenes into 2D segments based off each object's appearance, we explore how physics, especially object interactions, facilitates disentangling of 3D geometry and position of objects from video, in an unsupervised manner. Drawing inspiration from developmental psychology, our Physical Object Discovery Network (POD-Net) uses both multi-scale pixel cues and physical motion cues to accurately segment observable and partially occluded objects of varying sizes, and infer properties of those objects. Our model reliably segments objects on both synthetic and real scenes. The discovered object properties can also be used to reason about physical events.

updated: Mon Mar 22 2021 16:06:37 GMT+0000 (UTC)

published: Fri Jul 24 2020 04:46:21 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト