Diffusion with Forward Models: Solving Stochastic Inverse Problems Without Direct Supervision

Ayush Tewari; Tianwei Yin; George Cazenavette; Semon Rezchikov; Joshua B. Tenenbaum; Frédo Durand; William T. Freeman; Vincent Sitzmann

順モデルによる普及: 直接の監視なしで確率的逆問題を解く

ノイズ除去拡散モデルは、現実世界の信号の複雑な分布を捕捉するために使用される強力なタイプの生成モデルです。ただし、それらの適用可能性は、トレーニングサンプルがすぐに利用できるシナリオに限定されており、現実のアプリケーションでは必ずしもそうであるとは限りません。たとえば、逆グラフィックスの目的は、特定の画像と一致する 3D シーンの分布からサンプルを生成することですが、グラウンドトゥルース 3D シーンは利用できず、アクセスできるのは 2D 画像のみです。この制限に対処するために、直接観測されることのない信号の分布からサンプリングすることを学習する新しいクラスのノイズ除去拡散確率モデルを提案します。代わりに、これらの信号は、未知の信号の部分的な観測値を生成する既知の微分可能な順方向モデルを通じて間接的に測定されます。私たちのアプローチには、フォワードモデルをノイズ除去プロセスに直接統合することが含まれます。この統合により、観測の生成モデリングと基礎となる信号の生成モデリングが効果的に接続され、信号に対する条件付き生成モデルのエンドツーエンドのトレーニングが可能になります。推論中に、私たちのアプローチにより、特定の部分的な観測と一致する基礎となる信号の分布からサンプリングが可能になります。私たちは、3 つの困難なコンピュータービジョンタスクに対するこの方法の有効性を実証します。たとえば、逆グラフィックスのコンテキストでは、私たちのモデルにより、単一の 2D 入力画像と一致する 3D シーンの分布からの直接サンプリングが可能になります。

Denoising diffusion models are a powerful type of generative models used to capture complex distributions of real-world signals. However, their applicability is limited to scenarios where training samples are readily available, which is not always the case in real-world applications. For example, in inverse graphics, the goal is to generate samples from a distribution of 3D scenes that align with a given image, but ground-truth 3D scenes are unavailable and only 2D images are accessible. To address this limitation, we propose a novel class of denoising diffusion probabilistic models that learn to sample from distributions of signals that are never directly observed. Instead, these signals are measured indirectly through a known differentiable forward model, which produces partial observations of the unknown signal. Our approach involves integrating the forward model directly into the denoising process. This integration effectively connects the generative modeling of observations with the generative modeling of the underlying signals, allowing for end-to-end training of a conditional generative model over signals. During inference, our approach enables sampling from the distribution of underlying signals that are consistent with a given partial observation. We demonstrate the effectiveness of our method on three challenging computer vision tasks. For instance, in the context of inverse graphics, our model enables direct sampling from the distribution of 3D scenes that align with a single 2D input image.

updated: Fri Nov 17 2023 04:17:34 GMT+0000 (UTC)

published: Tue Jun 20 2023 17:53:00 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト