Few-View Object Reconstruction with Unknown Categories and Camera Poses

Hanwen Jiang; Zhenyu Jiang; Kristen Grauman; Yuke Zhu

未知のカテゴリとカメラポーズを使用した少数ビューオブジェクトの再構成

近年、オブジェクトの再構成は大きな進歩を遂げましたが、現在の方法では通常、高密度にキャプチャされた画像や既知のカメラポーズが必要であり、新しいオブジェクトカテゴリへの一般化は不十分です。自然界でのオブジェクトの再構成に向けて一歩踏み出すために、この作業では、既知のカメラポーズやオブジェクトカテゴリを使用せずに、いくつかの画像から一般的な現実世界のオブジェクトを再構成する方法を探っています。私たちの仕事の核心は、2 つの基本的な 3D ビジョンの問題 (形状の再構成と姿勢の推定) を統一されたアプローチで解決することです。私たちのアプローチは、これら2つの問題の相乗効果を捉えています。信頼性の高いカメラ姿勢推定により正確な形状再構築が行われ、正確な再構築により、異なるビュー間の堅牢な対応が誘導され、姿勢推定が容易になります。私たちのメソッドFORGEは、各ビューから3D機能を予測し、それらを入力画像と組み合わせて活用して、相対的なカメラポーズを推定するためのクロスビュー対応を確立します。 3D フィーチャは、推定された姿勢によって共有空間に変換され、ニューラルラディアンスフィールドに融合されます。再構成結果はボリュームレンダリング技術によってレンダリングされるため、3D 形状のグラウンドトゥルースなしでモデルをトレーニングできます。私たちの実験は、FORGE が 5 つのビューからオブジェクトを確実に再構築することを示しています。私たちの姿勢推定方法は、既存のものよりも大幅に優れています。予測されたポーズでの再構成結果は、グラウンドトゥルースポーズを使用したものと同等です。新しいテストカテゴリのパフォーマンスは、トレーニング中に見られるカテゴリの結果と一致します。プロジェクトページ: https://ut-austin-rpl.github.io/FORGE/

While object reconstruction has made great strides in recent years, current methods typically require densely captured images and/or known camera poses, and generalize poorly to novel object categories. To step toward object reconstruction in the wild, this work explores reconstructing general real-world objects from a few images without known camera poses or object categories. The crux of our work is solving two fundamental 3D vision problems -- shape reconstruction and pose estimation -- in a unified approach. Our approach captures the synergies of these two problems: reliable camera pose estimation gives rise to accurate shape reconstruction, and the accurate reconstruction, in turn, induces robust correspondence between different views and facilitates pose estimation. Our method FORGE predicts 3D features from each view and leverages them in conjunction with the input images to establish cross-view correspondence for estimating relative camera poses. The 3D features are then transformed by the estimated poses into a shared space and are fused into a neural radiance field. The reconstruction results are rendered by volume rendering techniques, enabling us to train the model without 3D shape ground-truth. Our experiments show that FORGE reliably reconstructs objects from five views. Our pose estimation method outperforms existing ones by a large margin. The reconstruction results under predicted poses are comparable to the ones using ground-truth poses. The performance on novel testing categories matches the results on categories seen during training. Project page: https://ut-austin-rpl.github.io/FORGE/

updated: Thu Jan 25 2024 21:57:52 GMT+0000 (UTC)

published: Thu Dec 08 2022 18:59:02 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト