Compositional 3D Scene Generation using Locally Conditioned Diffusion

Ryan Po; Gordon Wetzstein

局所的に調整された拡散を使用した合成 3D シーンの生成

複雑な 3D シーンの設計は、専門知識を必要とする単調な手動プロセスでした。新しいテキストから 3D への生成モデルは、このタスクをより直感的にする大きな可能性を示していますが、既存のアプローチはオブジェクトレベルの生成に限定されています。合成シーンの拡散へのアプローチとして局所的に調整された拡散を導入し、これらの部分間のシームレスな遷移を確保しながら、テキストプロンプトと境界ボックスを使用してセマンティックパーツを制御します。関連するベースラインよりも高い忠実度で合成 3D シーンの生成を可能にするスコア蒸留サンプリングベースのテキストから 3D への合成パイプラインを示します。

Designing complex 3D scenes has been a tedious, manual process requiring domain expertise. Emerging text-to-3D generative models show great promise for making this task more intuitive, but existing approaches are limited to object-level generation. We introduce locally conditioned diffusion as an approach to compositional scene diffusion, providing control over semantic parts using text prompts and bounding boxes while ensuring seamless transitions between these parts. We demonstrate a score distillation sampling--based text-to-3D synthesis pipeline that enables compositional 3D scene generation at a higher fidelity than relevant baselines.

updated: Thu Mar 23 2023 00:29:24 GMT+0000 (UTC)

published: Tue Mar 21 2023 22:37:16 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト