Direct Voxel Grid Optimization: Super-fast Convergence for Radiance Fields Reconstruction

Cheng Sun; Min Sun; Hwann-Tzong Chen

直接ボクセルグリッド最適化：放射輝度フィールド再構成のための超高速収束

既知のポーズでシーンをキャプチャする一連の画像からシーンごとの放射輝度フィールドを再構築するための超高速収束アプローチを紹介します。このタスクは、新しいビュー合成によく適用されますが、その最先端の品質と柔軟性により、最近、Neural Radiance Field（NeRF）によって革命を起こしました。ただし、NeRFとそのバリアントでは、1つのシーンで数時間から数日にわたる長いトレーニング時間が必要です。対照的に、私たちのアプローチはNeRFに匹敵する品質を達成し、単一のGPUで15分未満でゼロから迅速に収束します。シーンジオメトリ用の密度ボクセルグリッドと、複雑なビュー依存の外観用の浅いネットワークを備えたフィーチャボクセルグリッドで構成される表現を採用しています。明示的で離散化されたボリューム表現を使用したモデリングは新しいものではありませんが、高速な収束速度と高品質の出力に寄与する2つのシンプルでありながら重要な手法を提案します。最初に、ボクセル密度のアクティブ化後の補間を紹介します。これにより、より低いグリッド解像度でシャープな表面を生成できます。第2に、直接ボクセル密度の最適化は最適ではないジオメトリソリューションになりやすいため、いくつかの事前条件を課すことで最適化プロセスを強化します。最後に、5つの内向きのベンチマークでの評価は、私たちの方法がNeRFの品質に匹敵するかどうかを示していますが、新しいシーンのために最初からトレーニングするのに約15分しかかかりません。

We present a super-fast convergence approach to reconstructing the per-scene radiance field from a set of images that capture the scene with known poses. This task, which is often applied to novel view synthesis, is recently revolutionized by Neural Radiance Field (NeRF) for its state-of-the-art quality and flexibility. However, NeRF and its variants require a lengthy training time ranging from hours to days for a single scene. In contrast, our approach achieves NeRF-comparable quality and converges rapidly from scratch in less than 15 minutes with a single GPU. We adopt a representation consisting of a density voxel grid for scene geometry and a feature voxel grid with a shallow network for complex view-dependent appearance. Modeling with explicit and discretized volume representations is not new, but we propose two simple yet non-trivial techniques that contribute to fast convergence speed and high-quality output. First, we introduce the post-activation interpolation on voxel density, which is capable of producing sharp surfaces in lower grid resolution. Second, direct voxel density optimization is prone to suboptimal geometry solutions, so we robustify the optimization process by imposing several priors. Finally, evaluation on five inward-facing benchmarks shows that our method matches, if not surpasses, NeRF's quality, yet it only takes about 15 minutes to train from scratch for a new scene.

updated: Mon Nov 22 2021 14:02:07 GMT+0000 (UTC)

published: Mon Nov 22 2021 14:02:07 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト