RTMV: A Ray-Traced Multi-View Synthetic Dataset for Novel View Synthesis

Jonathan Tremblay; Moustafa Meshry; Alex Evans; Jan Kautz; Alexander Keller; Sameh Khamis; Charles Loop; Nathan Morrical; Koki Nagano; Towaki Takikawa; Stan Birchfield

RTMV：新しいビュー合成のためのレイトレーシングマルチビュー合成データセット

高解像度（1600 x 1600ピクセル）で高品質のレイトレーシングを使用して、約2000の複雑なシーンからレンダリングされた約300kの画像で構成される、新しいビュー合成用の大規模な合成データセットを紹介します。データセットは、新しいビュー合成用の既存の合成データセットよりも桁違いに大きいため、トレーニングと評価の両方に大規模な統合ベンチマークを提供します。高品質の3Dメッシュの4つの異なるソースを使用して、データセットのシーンは、カメラビュー、照明、形状、マテリアル、およびテクスチャの挑戦的なバリエーションを示しています。私たちのデータセットは既存の方法では処理するには大きすぎるため、Sparse Voxel Light Field（SVLF）を提案します。これは、合成データでNeRFと同等のパフォーマンスを実現しながら、桁違いに新しいビュー合成のための効率的なボクセルベースのライトフィールドアプローチです。トレーニングが速く、レンダリングが2桁速くなります。 SVLFは、スパースボクセルオクツリー、注意深いボクセルサンプリング（レイごとに少数のクエリのみを必要とする）、およびネットワーク構造の削減に依存することで、この速度を実現します。トレーニング時のグラウンドトゥルース深度マップも同様です。私たちのデータセットは、PythonベースのレイトレーシングレンダラーであるNViSIIによって生成されます。これは、非専門家が簡単に使用および共有できるように設計されており、スクリプトを使用することで柔軟かつ強力になり、高品質で物理ベースのデータセットを作成できます。レンダリングされた画像。データセットのサブセットを使用した実験により、シングルシーンモデリングのNeRFやmip-NeRF、カテゴリレベルのモデリングのpixelNeRFなどの標準的な方法を比較でき、この分野での将来の改善の必要性を示しています。

We present a large-scale synthetic dataset for novel view synthesis consisting of ~300k images rendered from nearly 2000 complex scenes using high-quality ray tracing at high resolution (1600 x 1600 pixels). The dataset is orders of magnitude larger than existing synthetic datasets for novel view synthesis, thus providing a large unified benchmark for both training and evaluation. Using 4 distinct sources of high-quality 3D meshes, the scenes of our dataset exhibit challenging variations in camera views, lighting, shape, materials, and textures. Because our dataset is too large for existing methods to process, we propose Sparse Voxel Light Field (SVLF), an efficient voxel-based light field approach for novel view synthesis that achieves comparable performance to NeRF on synthetic data, while being an order of magnitude faster to train and two orders of magnitude faster to render. SVLF achieves this speed by relying on a sparse voxel octree, careful voxel sampling (requiring only a handful of queries per ray), and reduced network structure; as well as ground truth depth maps at training time. Our dataset is generated by NViSII, a Python-based ray tracing renderer, which is designed to be simple for non-experts to use and share, flexible and powerful through its use of scripting, and able to create high-quality and physically-based rendered images. Experiments with a subset of our dataset allow us to compare standard methods like NeRF and mip-NeRF for single-scene modeling, and pixelNeRF for category-level modeling, pointing toward the need for future improvements in this area.

updated: Sat May 14 2022 13:15:32 GMT+0000 (UTC)

published: Sat May 14 2022 13:15:32 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト