Vid2CAD: CAD Model Alignment using Multi-View Constraints from Videos

Kevis-Kokitsi Maninis; Stefan Popov; Matthias Nießner; Vittorio Ferrari

Vid2CAD：ビデオからのマルチビュー制約を使用したCADモデルの配置

CADモデルを、複数のオブジェクトを含む複雑なシーンのビデオシーケンスに合わせるタスクに取り組みます。私たちの方法では、任意のビデオを処理し、そこに表示される各オブジェクトの9 DoFポーズを完全に自動的に復元して、共通の3D座標フレームに位置合わせすることができます。私たちの方法の中心的なアイデアは、個々のフレームからのニューラルネットワーク予測を時間的にグローバルなマルチビュー制約最適化の定式化と統合することです。この統合プロセスは、フレームごとの予測におけるスケールと深度のあいまいさを解決し、通常、すべてのポーズパラメータの推定を改善します。マルチビュー制約を活用することで、このメソッドはオクルージョンを解決し、個々のフレームで表示されていないオブジェクトを処理し、すべてのオブジェクトをシーンの単一のグローバルに一貫したCAD表現に再構築します。私たちが構築している最先端のシングルフレームメソッドMask2CADと比較して、Scan2CADデータセットの大幅な改善を実現しています（クラス平均精度11.6％から30.7％）。

We address the task of aligning CAD models to a video sequence of a complex scene containing multiple objects. Our method can process arbitrary videos and fully automatically recover the 9 DoF pose for each object appearing in it, thus aligning them in a common 3D coordinate frame. The core idea of our method is to integrate neural network predictions from individual frames with a temporally global, multi-view constraint optimization formulation. This integration process resolves the scale and depth ambiguities in the per-frame predictions, and generally improves the estimate of all pose parameters. By leveraging multi-view constraints, our method also resolves occlusions and handles objects that are out of view in individual frames, thus reconstructing all objects into a single globally consistent CAD representation of the scene. In comparison to the state-of-the-art single-frame method Mask2CAD that we build on, we achieve substantial improvements on the Scan2CAD dataset (from 11.6% to 30.7% class average accuracy).

updated: Tue Jan 25 2022 10:20:29 GMT+0000 (UTC)

published: Tue Dec 08 2020 18:57:45 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト