SAOR: Single-View Articulated Object Reconstruction

Mehmet Aygün; Oisin Mac Aodha

SAOR: 単一ビューの多関節オブジェクトの再構成

自然界で撮影された単一の画像から多関節オブジェクトの 3D 形状、テクスチャ、および視点を推定するための新しいアプローチである SAOR を紹介します。定義済みのカテゴリ固有の 3D テンプレートまたは調整された 3D スケルトンに依存する従来のアプローチとは異なり、SAOR は、3D オブジェクト形状の優先順位を必要とせずに、スケルトンのないパーツベースのモデルを使用して、単一ビューの画像コレクションから形状を明確にすることを学習します。不適切なソリューションを防ぐために、絡み合っていないオブジェクトの形状の変形と関節を利用するインスタンス間の一貫性の損失を提案します。これは、トレーニング中の視点の多様性を強化する新しいシルエットベースのサンプリングメカニズムによって支援されます。私たちの方法は、トレーニング中に既製の事前トレーニング済みネットワークから推定されたオブジェクトシルエットと相対的な深度マップのみを必要とします。推論時に、単一ビューの画像が与えられると、明示的なメッシュ表現が効率的に出力されます。関連する既存の研究と比較して、挑戦的な四足動物に関する質的および量的結果の改善が得られます。

We introduce SAOR, a novel approach for estimating the 3D shape, texture, and viewpoint of an articulated object from a single image captured in the wild. Unlike prior approaches that rely on pre-defined category-specific 3D templates or tailored 3D skeletons, SAOR learns to articulate shapes from single-view image collections with a skeleton-free part-based model without requiring any 3D object shape priors. To prevent ill-posed solutions, we propose a cross-instance consistency loss that exploits disentangled object shape deformation and articulation. This is helped by a new silhouette-based sampling mechanism to enhance viewpoint diversity during training. Our method only requires estimated object silhouettes and relative depth maps from off-the-shelf pre-trained networks during training. At inference time, given a single-view image, it efficiently outputs an explicit mesh representation. We obtain improved qualitative and quantitative results on challenging quadruped animals compared to relevant existing work.

updated: Thu Mar 23 2023 17:59:35 GMT+0000 (UTC)

published: Thu Mar 23 2023 17:59:35 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト