Neural Sparse Voxel Fields

Lingjie Liu; Jiatao Gu; Kyaw Zaw Lin; Tat-Seng Chua; Christian Theobalt

ニューラルスパースボクセルフィールド

古典的なコンピュータグラフィックス技術を使用した実世界のシーンの写真のようにリアルな自由視点レンダリングは、詳細な外観とジオメトリモデルをキャプチャするという難しい手順を必要とするため、困難です。最近の研究では、3D監視なしでジオメトリと外観の両方を暗黙的にエンコードするシーン表現を学習することにより、有望な結果が示されています。ただし、実際の既存のアプローチでは、ネットワーク容量が限られているか、カメラの光線とシーンジオメトリの正確な交差を見つけることが難しいため、レンダリングがぼやけることがよくあります。これらの表現から高解像度の画像を合成するには、多くの場合、時間のかかる光線のマーチングが必要です。この作業では、高速で高品質の自由視点レンダリングのための新しいニューラルシーン表現であるNeural Sparse Voxel Fields（NSVF）を紹介します。 NSVFは、各セルのローカルプロパティをモデル化するために、スパースボクセルオクツリーで編成されたボクセル境界の暗黙フィールドのセットを定義します。ポーズをとったRGB画像のセットのみから、微分可能なレイマーチング操作を使用して、基礎となるボクセル構造を徐々に学習します。スパースボクセルオクツリー構造を使用すると、関連するシーンコンテンツを含まないボクセルをスキップすることで、新しいビューのレンダリングを高速化できます。私たちの方法は、通常、推論時に最先端技術（つまり、NeRF（Mildenhall et al。、2020））よりも10倍以上高速でありながら、より高品質の結果を達成します。さらに、明示的なスパースボクセル表現を利用することで、私たちの方法をシーン編集やシーン構成に簡単に適用できます。また、マルチシーン学習、動く人間の自由視点レンダリング、大規模シーンレンダリングなど、いくつかの難しいタスクについても説明します。コードとデータは、当社のWebサイト（https://github.com/facebookresearch/NSVF）で入手できます。

Photo-realistic free-viewpoint rendering of real-world scenes using classical computer graphics techniques is challenging, because it requires the difficult step of capturing detailed appearance and geometry models. Recent studies have demonstrated promising results by learning scene representations that implicitly encode both geometry and appearance without 3D supervision. However, existing approaches in practice often show blurry renderings caused by the limited network capacity or the difficulty in finding accurate intersections of camera rays with the scene geometry. Synthesizing high-resolution imagery from these representations often requires time-consuming optical ray marching. In this work, we introduce Neural Sparse Voxel Fields (NSVF), a new neural scene representation for fast and high-quality free-viewpoint rendering. NSVF defines a set of voxel-bounded implicit fields organized in a sparse voxel octree to model local properties in each cell. We progressively learn the underlying voxel structures with a differentiable ray-marching operation from only a set of posed RGB images. With the sparse voxel octree structure, rendering novel views can be accelerated by skipping the voxels containing no relevant scene content. Our method is typically over 10 times faster than the state-of-the-art (namely, NeRF(Mildenhall et al., 2020)) at inference time while achieving higher quality results. Furthermore, by utilizing an explicit sparse voxel representation, our method can easily be applied to scene editing and scene composition. We also demonstrate several challenging tasks, including multi-scene learning, free-viewpoint rendering of a moving human, and large-scale scene rendering. Code and data are available at our website: https://github.com/facebookresearch/NSVF.

updated: Wed Jan 06 2021 21:04:24 GMT+0000 (UTC)

published: Wed Jul 22 2020 17:51:31 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト