IC3D: Image-Conditioned 3D Diffusion for Shape Generation

Cristian Sbrolli; Paolo Cudrano; Matteo Frosi; Matteo Matteucci

IC3D: 形状生成のための画像調整 3D 拡散

ここ数年、ノイズ除去拡散確率モデル (DDPM) は、多くの生成タスクで最先端の結果を得ており、GAN や他のクラスの生成モデルよりも優れています。特に、テキストガイド画像合成などの条件付き生成タスクを含む、さまざまな画像生成サブタスクで印象的な結果を達成しました。 2D 生成における DDPM の成功を考えると、最近では 3D 形状生成に適用され、以前のアプローチよりも優れており、最先端の結果に達しています。ただし、これらの既存の 3D DDPM 作業は、主に無条件またはクラス条件付きのガイダンスをほとんどまたはまったく使用しません。この作業では、イメージガイダンスによって 3D 形状を生成する Image-Conditioned 3D Diffusion モデルである IC3D を紹介します。 DDPM を導くために、CISP (Contrastive Image-Shape Pre-training) を導入します。これは、テキストから画像への DDPM に関する文献に触発された、対照的な事前トレーニングによって画像と形状を共同で埋め込むモデルです。当社のジェネレーティブ拡散モデルは、3D 生成の品質と多様性において最先端のものよりも優れています。さらに、IC3D の生成的な性質にもかかわらず、その生成された形状は、人間による評価を並べて実行することにより、品質とクエリイメージとの一貫性の点で、SoTA シングルビュー 3D 再構成モデルよりも人間の評価者に好まれることを示しています。アブレーション研究は、現実的な生成に不可欠な構造的完全性特性を学習するための CISP の重要性を示しています。このようなバイアスにより、通常の埋め込みスペースが生成され、分布外の画像の補間と調整が可能になります。また、IC3D は、遮られたビューの首尾一貫した多様な完成を生成し、制御された現実のアプリケーションでの採用を可能にします。

In the last years, Denoising Diffusion Probabilistic Models (DDPMs) obtained state-of-the-art results in many generative tasks, outperforming GANs and other classes of generative models. In particular, they reached impressive results in various image generation sub-tasks, among which conditional generation tasks such as text-guided image synthesis. Given the success of DDPMs in 2D generation, they have more recently been applied to 3D shape generation, outperforming previous approaches and reaching state-of-the-art results. However, these existing 3D DDPM works make little or no use of guidance, mainly being unconditional or class-conditional. In this work, we present IC3D, an Image-Conditioned 3D Diffusion model that generates 3D shapes by image guidance. To guide our DDPM, we introduce CISP (Contrastive Image-Shape Pre-training), a model jointly embedding images and shapes by contrastive pre-training, inspired by the literature on text-to-image DDPMs. Our generative diffusion model outperforms the state-of-the-art in 3D generation quality and diversity. Furthermore, despite IC3D generative nature, we show that its generated shapes are preferred by human evaluators to a SoTA single-view 3D reconstruction model in terms of quality and coherence to the query image by running a side-by-side human evaluation. Ablation studies show the importance of CISP for learning structural integrity properties, crucial for realistic generation. Such biases yield a regular embedding space and allow for interpolation and conditioning on out-of-distribution images, while also making IC3D capable of generating coherent but diverse completions of occluded views and enabling its adoption in controlled real-life applications.

updated: Fri Mar 31 2023 18:43:30 GMT+0000 (UTC)

published: Sun Nov 20 2022 04:21:42 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト