Reduce, Reuse, Recycle: Compositional Generation with Energy-Based Diffusion Models and MCMC

Yilun Du; Conor Durkan; Robin Strudel; Joshua B. Tenenbaum; Sander Dieleman; Rob Fergus; Jascha Sohl-Dickstein; Arnaud Doucet; Will Grathwohl

リデュース、リユース、リサイクル: エネルギーベースの拡散モデルと MCMC による組成生成

導入以来、拡散モデルは急速に多くの分野でジェネレーティブモデリングへの一般的なアプローチになりました。それらは、対数確率密度関数の時変シーケンスの勾配を学習していると解釈できます。この解釈は、拡散モデルの事後制御の方法として、分類子ベースおよび分類子なしのガイダンスを動機付けました。この作業では、拡散モデルのスコアベースの解釈を使用してこれらのアイデアを構築し、構成の生成とガイダンスを含むタスクのために拡散モデルを調整、変更、および再利用するための代替方法を探ります。特に、現在の技術を使用して特定のタイプの構成が失敗する理由を調査し、いくつかの解決策を提示します。サンプラー（モデルではなく）がこの失敗の原因であると結論付け、MCMCに触発された新しいサンプラーを提案します。これにより、構成生成の成功が可能になります。さらに、拡散モデルのエネルギーベースのパラメーター化を提案します。これにより、新しい構成演算子とより洗練された Metropolis 補正サンプラーの使用が可能になります。興味深いことに、これらのサンプラーは、分類器に基づく ImageNet モデリングや合成テキストから画像への生成など、幅広い問題全体で合成生成の顕著な改善につながることがわかりました。

Since their introduction, diffusion models have quickly become the prevailing approach to generative modeling in many domains. They can be interpreted as learning the gradients of a time-varying sequence of log-probability density functions. This interpretation has motivated classifier-based and classifier-free guidance as methods for post-hoc control of diffusion models. In this work, we build upon these ideas using the score-based interpretation of diffusion models, and explore alternative ways to condition, modify, and reuse diffusion models for tasks involving compositional generation and guidance. In particular, we investigate why certain types of composition fail using current techniques and present a number of solutions. We conclude that the sampler (not the model) is responsible for this failure and propose new samplers, inspired by MCMC, which enable successful compositional generation. Further, we propose an energy-based parameterization of diffusion models which enables the use of new compositional operators and more sophisticated, Metropolis-corrected samplers. Intriguingly we find these samplers lead to notable improvements in compositional generation across a wide set of problems such as classifier-guided ImageNet modeling and compositional text-to-image generation.

updated: Sat Nov 18 2023 20:19:04 GMT+0000 (UTC)

published: Wed Feb 22 2023 18:48:46 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト