CenterSnap: Single-Shot Multi-Object 3D Shape Reconstruction and Categorical 6D Pose and Size Estimation

Muhammad Zubair Irshad; Thomas Kollar; Michael Laskey; Kevin Stone; Zsolt Kira

CenterSnap：シングルショットマルチオブジェクト3D形状再構成とカテゴリ別6Dポーズおよびサイズの推定

この論文は、単一ビューRGB-D観測からの同時マルチオブジェクト3D再構成、6Dポーズおよびサイズ推定の複雑なタスクを研究します。インスタンスレベルのポーズ推定とは対照的に、推論時にCADモデルが利用できないというより困難な問題に焦点を当てます。既存のアプローチは、主に複雑な多段階パイプラインに従います。このパイプラインは、最初に画像内の各オブジェクトインスタンスをローカライズして検出し、次に3Dメッシュまたは6Dポーズのいずれかに回帰します。これらのアプローチは、オクルージョンが存在する可能性がある複雑なマルチオブジェクトシナリオでは、計算コストが高く、パフォーマンスが低いという問題があります。したがって、3D形状を予測し、6Dポーズとサイズをバウンディングボックスのない方法で共同で推定するための単純な1段階のアプローチを提示します。特に、このメソッドはオブジェクトインスタンスを空間中心として扱います。各中心は、オブジェクトの完全な形状とその6Dポーズおよびサイズを示します。このピクセルごとの表現を通じて、私たちのアプローチは、リアルタイム（40 FPS）で複数の新しいオブジェクトインスタンスを再構築し、シングルフォワードパスでそれらの6Dポーズとサイズを予測できます。広範な実験を通じて、私たちのアプローチは、マルチオブジェクトShapeNetおよびNOCSデータセットのすべての形状完成およびカテゴリ別6Dポーズおよびサイズ推定ベースラインを大幅に上回り、新しい実世界のオブジェクトインスタンスの6DポーズのmAPが12.6％絶対的に向上することを示します。

This paper studies the complex task of simultaneous multi-object 3D reconstruction, 6D pose and size estimation from a single-view RGB-D observation. In contrast to instance-level pose estimation, we focus on a more challenging problem where CAD models are not available at inference time. Existing approaches mainly follow a complex multi-stage pipeline which first localizes and detects each object instance in the image and then regresses to either their 3D meshes or 6D poses. These approaches suffer from high-computational cost and low performance in complex multi-object scenarios, where occlusions can be present. Hence, we present a simple one-stage approach to predict both the 3D shape and estimate the 6D pose and size jointly in a bounding-box free manner. In particular, our method treats object instances as spatial centers where each center denotes the complete shape of an object along with its 6D pose and size. Through this per-pixel representation, our approach can reconstruct in real-time (40 FPS) multiple novel object instances and predict their 6D pose and sizes in a single-forward pass. Through extensive experiments, we demonstrate that our approach significantly outperforms all shape completion and categorical 6D pose and size estimation baselines on multi-object ShapeNet and NOCS datasets respectively with a 12.6% absolute improvement in mAP for 6D pose for novel real-world object instances.

updated: Thu Mar 03 2022 18:59:04 GMT+0000 (UTC)

published: Thu Mar 03 2022 18:59:04 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト