3D-aware Image Generation using 2D Diffusion Models

Jianfeng Xiang; Jiaolong Yang; Binbin Huang; Xin Tong

2D 拡散モデルを使用した 3D 対応の画像生成

このホワイトペーパーでは、2D 拡散モデルを活用する新しい 3D 対応画像生成方法を紹介します。 3D 認識画像生成タスクを多視点 2D 画像セットの生成として定式化し、さらに連続的な無条件-条件付き多視点画像生成プロセスに定式化します。これにより、2D 拡散モデルを利用して、この方法の生成モデリング能力を高めることができます。さらに、単眼深度推定器からの深度情報を組み込み、静止画像のみを使用して条件付き拡散モデルのトレーニングデータを構築します。以前の方法では対処されていない大規模なデータセット、つまり ImageNet で方法をトレーニングします。これにより、従来の方法を大幅に上回る高品質の画像が生成されます。さらに、私たちのアプローチは、「野生の」現実世界の環境から収集されたトレーニング画像が多様で整列していない場合でも、大きな視野角を持つインスタンスを生成する機能を示しています。

In this paper, we introduce a novel 3D-aware image generation method that leverages 2D diffusion models. We formulate the 3D-aware image generation task as multiview 2D image set generation, and further to a sequential unconditional-conditional multiview image generation process. This allows us to utilize 2D diffusion models to boost the generative modeling power of the method. Additionally, we incorporate depth information from monocular depth estimators to construct the training data for the conditional diffusion model using only still images. We train our method on a large-scale dataset, i.e., ImageNet, which is not addressed by previous methods. It produces high-quality images that significantly outperform prior methods. Furthermore, our approach showcases its capability to generate instances with large view angles, even though the training images are diverse and unaligned, gathered from "in-the-wild" real-world environments.

updated: Fri Mar 31 2023 09:03:18 GMT+0000 (UTC)

published: Fri Mar 31 2023 09:03:18 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト