SceneDreamer: Unbounded 3D Scene Generation from 2D Image Collections

Zhaoxi Chen; Guangcong Wang; Ziwei Liu

SceneDreamer: 2D 画像コレクションからの無限の 3D シーン生成

この作業では、ランダムノイズから大規模な 3D ランドスケープを合成する、境界のない 3D シーンの無条件生成モデルである SceneDreamer を紹介します。私たちのフレームワークは、3D 注釈なしで、野生の 2D 画像コレクションのみから学習されます。 SceneDreamer の中核となるのは、1) 効率的かつ表現力豊かな 3D シーン表現、2) 生成的なシーンのパラメーター化、3) 2D 画像からの知識を活用できる効果的なレンダラーで構成される原則的な学習パラダイムです。私たちのフレームワークは、高さフィールドとセマンティックフィールドで構成されるシンプレックスノイズから生成された効率的な鳥瞰図 (BEV) 表現から始まります。高さフィールドは 3D シーンのサーフェス標高を表し、セマンティックフィールドは詳細なシーンセマンティクスを提供します。この BEV シーン表現により、1) 二次的な複雑さを持つ 3D シーンの表現、2) 絡み合っていないジオメトリとセマンティクス、および 3) 効率的なトレーニングが可能になります。さらに、与えられた 3D 位置とシーンセマンティクスを与えられた潜在空間をパラメーター化するための新しい生成ニューラルハッシュグリッドを提案します。これは、シーン全体で一般化可能な機能をエンコードすることを目的としています。最後に、敵対的トレーニングを通じて 2D 画像コレクションから学習したニューラルボリュームレンダラーを使用して、写真のようにリアルな画像を生成します。広範な実験により、SceneDreamer の有効性と、鮮やかでありながら多様な境界のない 3D 世界を生成する最先端の方法に対する優位性が実証されています。

In this work, we present SceneDreamer, an unconditional generative model for unbounded 3D scenes, which synthesizes large-scale 3D landscapes from random noises. Our framework is learned from in-the-wild 2D image collections only, without any 3D annotations. At the core of SceneDreamer is a principled learning paradigm comprising 1) an efficient yet expressive 3D scene representation, 2) a generative scene parameterization, and 3) an effective renderer that can leverage the knowledge from 2D images. Our framework starts from an efficient bird's-eye-view (BEV) representation generated from simplex noise, which consists of a height field and a semantic field. The height field represents the surface elevation of 3D scenes, while the semantic field provides detailed scene semantics. This BEV scene representation enables 1) representing a 3D scene with quadratic complexity, 2) disentangled geometry and semantics, and 3) efficient training. Furthermore, we propose a novel generative neural hash grid to parameterize the latent space given 3D positions and the scene semantics, which aims to encode generalizable features across scenes. Lastly, a neural volumetric renderer, learned from 2D image collections through adversarial training, is employed to produce photorealistic images. Extensive experiments demonstrate the effectiveness of SceneDreamer and superiority over state-of-the-art methods in generating vivid yet diverse unbounded 3D worlds.

updated: Thu Feb 02 2023 18:59:16 GMT+0000 (UTC)

published: Thu Feb 02 2023 18:59:16 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト