Segment Anything in 3D with NeRFs

Jiazhong Cen; Zanwei Zhou; Jiemin Fang; Chen Yang; Wei Shen; Lingxi Xie; Dongsheng Jiang; Xiaopeng Zhang; Qi Tian

NeRF を使用して 3D であらゆるものをセグメント化

最近、Segment Anything Model (SAM) が、2D 画像内のあらゆるものをセグメント化できる強力なビジョン基盤モデルとして登場しました。この論文は、3D オブジェクトをセグメント化するための SAM を一般化することを目的としています。コストのかかるデータ取得と注釈の手順を 3D で複製するのではなく、マルチビュー 2D 画像を 3D 空間に接続する安価で既製の事前手段として Neural Radiance Field (NeRF) を活用する効率的なソリューションを設計します。。私たちは、提案されたソリューションを SA3D (3D であらゆるものをセグメント化する) と呼びます。単一のビューでターゲットオブジェクトの手動セグメンテーションプロンプト (たとえば、大まかな点) を提供することのみが必要です。これは、SAM を使用してこのビューで 2D マスクを生成するために使用されます。次に、SA3D は、さまざまなビューにわたってマスクの逆レンダリングとクロスビューのセルフプロンプトを交互に実行し、ボクセルグリッドで構築されたターゲットオブジェクトの 3D マスクを反復的に完成させます。前者は、NeRF によって学習された密度分布のガイダンスに従って、現在のビューで SAM によって取得された 2D マスクを 3D マスクに投影します。後者は、別のビューで NeRF レンダリングされた 2D マスクから SAM への入力として信頼できるプロンプトを自動的に抽出します。 SA3D がさまざまなシーンに適応し、数分以内に 3D セグメンテーションを達成することを実験で示します。私たちの研究は、2D モデルが複数のビューにわたる迅速なセグメンテーションに着実に対応できる限り、2D ビジョン基盤モデルを 3D に引き上げる一般的で効率的な方法論を提供します。プロジェクトページは https://jumpat.github.io/SA3D/ にあります。

Recently, the Segment Anything Model (SAM) emerged as a powerful vision foundation model which is capable to segment anything in 2D images. This paper aims to generalize SAM to segment 3D objects. Rather than replicating the data acquisition and annotation procedure which is costly in 3D, we design an efficient solution, leveraging the Neural Radiance Field (NeRF) as a cheap and off-the-shelf prior that connects multi-view 2D images to the 3D space. We refer to the proposed solution as SA3D, for Segment Anything in 3D. It is only required to provide a manual segmentation prompt (e.g., rough points) for the target object in a single view, which is used to generate its 2D mask in this view with SAM. Next, SA3D alternately performs mask inverse rendering and cross-view self-prompting across various views to iteratively complete the 3D mask of the target object constructed with voxel grids. The former projects the 2D mask obtained by SAM in the current view onto 3D mask with guidance of the density distribution learned by the NeRF; The latter extracts reliable prompts automatically as the input to SAM from the NeRF-rendered 2D mask in another view. We show in experiments that SA3D adapts to various scenes and achieves 3D segmentation within minutes. Our research offers a generic and efficient methodology to lift a 2D vision foundation model to 3D, as long as the 2D model can steadily address promptable segmentation across multiple views. The project page is at https://jumpat.github.io/SA3D/.

updated: Thu Jun 01 2023 13:58:46 GMT+0000 (UTC)

published: Mon Apr 24 2023 17:57:15 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト