NeRF-Pose: A First-Reconstruct-Then-Regress Approach for Weakly-supervised 6D Object Pose Estimation

Fu Li; Hao Yu; Ivan Shugurov; Benjamin Busam; Shaowu Yang; Slobodan Ilic

NeRF-Pose：弱教師あり6Dオブジェクトポーズ推定のためのFirst-Reconstruct-Then-Regressアプローチ

単眼画像内の3Dオブジェクトのポーズ推定は、コンピュータビジョンにおける基本的かつ長年の問題です。 6Dポーズ推定のための既存の深層学習アプローチは、通常、3Dオブジェクトモデルと6Dポーズ注釈の可用性の仮定に依存しています。ただし、実際のデータでの6Dポーズの正確な注釈は、複雑で時間がかかり、スケーラブルではありません。一方、合成データは適切にスケーリングされますが、リアリズムに欠けます。これらの問題を回避するために、NeRF-Poseという名前の、弱く監視された再構成ベースのパイプラインを提示します。これは、トレーニング中に2Dオブジェクトのセグメンテーションと既知の相対的なカメラポーズのみを必要とします。最初に再構築してから回帰するという考えに従って、最初に、暗黙のニューラル表現の形式で複数のビューからオブジェクトを再構築します。次に、ポーズ回帰ネットワークをトレーニングして、画像と再構成されたモデルの間のピクセル単位の2D-3D対応を予測します。推論では、アプローチは入力として単一の画像のみを必要とします。 NeRF対応のPnP + RANSACアルゴリズムを使用して、予測された対応から安定した正確なポーズを推定します。 LineModとLineMod-Occlusionの実験は、提案された方法が、弱いラベルだけで訓練されているにもかかわらず、最高の6Dポーズ推定方法と比較して最先端の精度を持っていることを示しています。さらに、自作DBデータセットをより実際のトレーニング画像で拡張して、弱く監視されたタスクをサポートし、このデータセットで説得力のある結果を達成します。拡張データセットとコードはまもなくリリースされます。

Pose estimation of 3D objects in monocular images is a fundamental and long-standing problem in computer vision. Existing deep learning approaches for 6D pose estimation typically rely on the assumption of availability of 3D object models and 6D pose annotations. However, precise annotation of 6D poses in real data is intricate, time-consuming and not scalable, while synthetic data scales well but lacks realism. To avoid these problems, we present a weakly-supervised reconstruction-based pipeline, named NeRF-Pose, which needs only 2D object segmentation and known relative camera poses during training. Following the first-reconstruct-then-regress idea, we first reconstruct the objects from multiple views in the form of an implicit neural representation. Then, we train a pose regression network to predict pixel-wise 2D-3D correspondences between images and the reconstructed model. At inference, the approach only needs a single image as input. A NeRF-enabled PnP+RANSAC algorithm is used to estimate stable and accurate pose from the predicted correspondences. Experiments on LineMod and LineMod-Occlusion show that the proposed method has state-of-the-art accuracy in comparison to the best 6D pose estimation methods in spite of being trained only with weak labels. Besides, we extend the Homebrewed DB dataset with more real training images to support the weakly supervised task and achieve compelling results on this dataset. The extended dataset and code will be released soon.

updated: Sat Sep 09 2023 04:49:33 GMT+0000 (UTC)

published: Wed Mar 09 2022 15:28:02 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト