Self-Supervised Geometric Correspondence for Category-Level 6D Object Pose Estimation in the Wild

Kaifeng Zhang; Yang Fu; Shubhankar Borse; Hong Cai; Fatih Porikli; Xiaolong Wang

カテゴリレベルの 6D オブジェクト姿勢推定のための自己教師付き幾何学的対応

6D オブジェクトの姿勢推定は、コンピュータービジョンやロボット工学に広く応用されていますが、注釈が不足しているため、解決にはほど遠い状態です。カテゴリレベルの 6D ポーズに移行すると、問題はさらに難しくなり、見えないインスタンスへの一般化が必要になります。現在のアプローチは、シミュレーションからの注釈または人間から収集された注釈を活用することによって制限されています。このホワイトペーパーでは、実際のカテゴリレベルの 6D ポーズ推定のために大規模な実世界のオブジェクトビデオで直接トレーニングされた自己教師あり学習アプローチを導入することで、この障壁を克服します。私たちのフレームワークは、オブジェクトカテゴリの標準的な 3D 形状を再構築し、表面埋め込みを介して入力画像と標準的な形状の間の密な対応を学習します。トレーニングのために、2D-3D 空間、異なるインスタンスおよび異なる時間ステップにわたってサイクルを構築する、新しい幾何学的なサイクル一貫性損失を提案します。学習した対応関係は、6D ポーズ推定や、キーポイント転送などの他のダウンストリームタスクに適用できます。驚くべきことに、私たちの方法は、人間の注釈やシミュレーターを一切使用せずに、野生の画像に対して以前の教師ありまたは半教師ありの方法と同等またはそれ以上のパフォーマンスを達成できます。私たちのプロジェクトページは https://kywind.github.io/self-pose です。

While 6D object pose estimation has wide applications across computer vision and robotics, it remains far from being solved due to the lack of annotations. The problem becomes even more challenging when moving to category-level 6D pose, which requires generalization to unseen instances. Current approaches are restricted by leveraging annotations from simulation or collected from humans. In this paper, we overcome this barrier by introducing a self-supervised learning approach trained directly on large-scale real-world object videos for category-level 6D pose estimation in the wild. Our framework reconstructs the canonical 3D shape of an object category and learns dense correspondences between input images and the canonical shape via surface embedding. For training, we propose novel geometrical cycle-consistency losses which construct cycles across 2D-3D spaces, across different instances and different time steps. The learned correspondence can be applied for 6D pose estimation and other downstream tasks such as keypoint transfer. Surprisingly, our method, without any human annotations or simulators, can achieve on-par or even better performance than previous supervised or semi-supervised methods on in-the-wild images. Our project page is: https://kywind.github.io/self-pose .

updated: Thu Oct 13 2022 17:19:22 GMT+0000 (UTC)

published: Thu Oct 13 2022 17:19:22 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト