DreamSparse: Escaping from Plato's Cave with 2D Frozen Diffusion Model Given Sparse Views

Paul Yoo; Jiaxian Guo; Yutaka Matsuo; Shixiang Shane Gu

DreamSparse: まばらなビューを使用した 2D 凍結拡散モデルを使用したプラトンの洞窟からの脱出

いくつかのビューから新しいビュー画像を合成することは、困難ですが実際的な問題です。既存の方法では、提供される情報が不十分であるため、このような少数のビュー設定では高品質の結果を生成するのに苦労したり、オブジェクトごとの最適化が必要になったりすることがよくあります。この研究では、新しいビュー画像を合成するための事前トレーニング済み拡散モデルの強力な 2D 事前分布の活用を検討します。それにもかかわらず、2D 拡散モデルには 3D 認識が欠けており、画像合成が歪んでアイデンティティが損なわれることになります。これらの問題に対処するために、私たちは、フリーズされた事前トレーニング済み拡散モデルがジオメトリと同一性の一貫した新しいビュー画像を生成できるようにするフレームワークである DreamSparse を提案します。具体的には、DreamSparse には、まばらなビューから 3D フィーチャを 3D 事前にキャプチャするように設計されたジオメトリモジュールが組み込まれています。続いて、空間誘導モデルを導入して、これらの 3D 特徴マップを生成プロセス用の空間情報に変換します。この情報は、事前トレーニングされた拡散モデルをガイドするために使用され、調整することなく幾何学的に一貫した画像を生成できるようになります。 DreamSparse は、事前トレーニングされた拡散モデルの強力な画像事前分布を活用して、オブジェクトレベルとシーンレベルの画像の両方に対して高品質の新しいビューを合成し、オープンセット画像に一般化することができます。実験結果は、私たちのフレームワークがまばらなビューから新しいビュー画像を効果的に合成でき、トレーニングされたカテゴリ画像とオープンセットのカテゴリ画像の両方でベースラインを上回るパフォーマンスを発揮することを示しています。詳細な結果は、プロジェクトページ https://sites.google.com/view/dreamsparse-webpage でご覧いただけます。

Synthesizing novel view images from a few views is a challenging but practical problem. Existing methods often struggle with producing high-quality results or necessitate per-object optimization in such few-view settings due to the insufficient information provided. In this work, we explore leveraging the strong 2D priors in pre-trained diffusion models for synthesizing novel view images. 2D diffusion models, nevertheless, lack 3D awareness, leading to distorted image synthesis and compromising the identity. To address these problems, we propose DreamSparse, a framework that enables the frozen pre-trained diffusion model to generate geometry and identity-consistent novel view image. Specifically, DreamSparse incorporates a geometry module designed to capture 3D features from sparse views as a 3D prior. Subsequently, a spatial guidance model is introduced to convert these 3D feature maps into spatial information for the generative process. This information is then used to guide the pre-trained diffusion model, enabling it to generate geometrically consistent images without tuning it. Leveraging the strong image priors in the pre-trained diffusion models, DreamSparse is capable of synthesizing high-quality novel views for both object and scene-level images and generalising to open-set images. Experimental results demonstrate that our framework can effectively synthesize novel view images from sparse views and outperforms baselines in both trained and open-set category images. More results can be found on our project page: https://sites.google.com/view/dreamsparse-webpage.

updated: Wed Jun 14 2023 10:16:26 GMT+0000 (UTC)

published: Tue Jun 06 2023 05:26:26 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト