Segment Anything in 3D with NeRFs

Jiazhong Cen; Zanwei Zhou; Jiemin Fang; Wei Shen; Lingxi Xie; Xiaopeng Zhang; Qi Tian

NeRF を使用してあらゆるものを 3D でセグメント化

Segment Anything Model (SAM) は、さまざまな 2D 画像内の任意のオブジェクト/パーツをセグメント化する際の有効性を実証していますが、3D に対するその機能は十分に調査されていません。現実世界は、多数の 3D シーンとオブジェクトで構成されています。アクセス可能な 3D データが不足しており、その取得と注釈のコストが高いため、SAM を 3D に持ち上げることは困難ですが、価値のある研究手段です。これを念頭に置いて、SA3D という名前の 3D で何でもセグメント化するための新しいフレームワークを提案します。ニューラル放射輝度場 (NeRF) モデルが与えられると、SA3D を使用すると、ユーザーは単一のレンダリングビューでワンショットの手動プロンプトを表示するだけで、任意のターゲットオブジェクトの 3D セグメンテーション結果を取得できます。入力プロンプトを使用して、SAM はそれに応じたビューからターゲットオブジェクトを切り取ります。得られた 2D セグメンテーションマスクは、密度ガイド付き逆レンダリングを介して 3D マスクグリッドに投影されます。次に、他のビューからの 2D マスクがレンダリングされます。これはほとんどが未完成ですが、SAM に再度入力するためのクロスビューの自己プロンプトとして使用されます。完全なマスクを取得して、マスクグリッドに投影できます。この手順は反復的に実行され、最終的に正確な 3D マスクが学習されます。 SA3D は、追加の再設計なしで、さまざまな放射輝度フィールドに効果的に適応できます。セグメンテーションプロセス全体は、エンジニアリングの最適化なしで約 2 分で完了できます。私たちの実験は、さまざまなシーンでの SA3D の有効性を示しており、3D シーン認識における SAM の可能性を強調しています。プロジェクトページは https://jumpat.github.io/SA3D/ です。

The Segment Anything Model (SAM) has demonstrated its effectiveness in segmenting any object/part in various 2D images, yet its ability for 3D has not been fully explored. The real world is composed of numerous 3D scenes and objects. Due to the scarcity of accessible 3D data and high cost of its acquisition and annotation, lifting SAM to 3D is a challenging but valuable research avenue. With this in mind, we propose a novel framework to Segment Anything in 3D, named SA3D. Given a neural radiance field (NeRF) model, SA3D allows users to obtain the 3D segmentation result of any target object via only one-shot manual prompting in a single rendered view. With input prompts, SAM cuts out the target object from the according view. The obtained 2D segmentation mask is projected onto 3D mask grids via density-guided inverse rendering. 2D masks from other views are then rendered, which are mostly uncompleted but used as cross-view self-prompts to be fed into SAM again. Complete masks can be obtained and projected onto mask grids. This procedure is executed via an iterative manner while accurate 3D masks can be finally learned. SA3D can adapt to various radiance fields effectively without any additional redesigning. The entire segmentation process can be completed in approximately two minutes without any engineering optimization. Our experiments demonstrate the effectiveness of SA3D in different scenes, highlighting the potential of SAM in 3D scene perception. The project page is at https://jumpat.github.io/SA3D/.

updated: Mon Apr 24 2023 17:57:15 GMT+0000 (UTC)

published: Mon Apr 24 2023 17:57:15 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト