Towards Practical Plug-and-Play Diffusion Models

Hyojun Go; Yunsung Lee; Jin-Young Kim; Seunghyun Lee; Myeongho Jeong; Hyun Seung Lee; Seungtaek Choi

実用的なプラグアンドプレイ拡散モデルに向けて

拡散ベースの生成モデルは、画像生成において目覚ましい成功を収めました。彼らのガイダンス定式化により、外部モデルは、拡散モデルを微調整することなく、さまざまなタスクの生成プロセスをプラグアンドプレイで制御できます。ただし、公的に入手可能な既製のモデルをガイダンスに直接使用することは、ノイズの多い入力でのパフォーマンスが低いため失敗します。そのため、既存の方法では、ノイズで破損したラベル付きデータを使用してガイダンスモデルを微調整します。このホワイトペーパーでは、この方法には次の 2 つの側面で制限があると主張します。 (2) ラベル付けされたデータセットを収集すると、さまざまなタスクのスケールアップが妨げられます。制限に取り組むために、各エキスパートが特定のノイズ範囲に特化し、対応するタイムステップで拡散の逆プロセスをガイドする複数のエキスパートを活用する新しい戦略を提案します。ただし、複数のネットワークを管理し、ラベル付けされたデータを利用することは実行不可能であるため、パラメーター効率の高い微調整とデータフリーの知識転送を活用する実用的なプラグアンドプレイ (PPAP) と呼ばれる実用的なガイダンスフレームワークを提示します。 ImageNet クラスの条件付き生成実験を徹底的に実施して、私たちの方法が小さなトレーニング可能なパラメーターとラベル付きデータなしで拡散をうまく導くことができることを示します。最後に、画像分類器、深度推定器、およびセマンティックセグメンテーションモデルが、プラグアンドプレイの方法でフレームワークを介して公開されている GLIDE を導くことができることを示します。コードは https://github.com/riiid/PPAP で入手できます。

Diffusion-based generative models have achieved remarkable success in image generation. Their guidance formulation allows an external model to plug-and-play control the generation process for various tasks without finetuning the diffusion model. However, the direct use of publicly available off-the-shelf models for guidance fails due to their poor performance on noisy inputs. For that, the existing practice is to fine-tune the guidance models with labeled data corrupted with noises. In this paper, we argue that this practice has limitations in two aspects: (1) performing on inputs with extremely various noises is too hard for a single guidance model; (2) collecting labeled datasets hinders scaling up for various tasks. To tackle the limitations, we propose a novel strategy that leverages multiple experts where each expert is specialized in a particular noise range and guides the reverse process of the diffusion at its corresponding timesteps. However, as it is infeasible to manage multiple networks and utilize labeled data, we present a practical guidance framework termed Practical Plug-And-Play (PPAP), which leverages parameter-efficient fine-tuning and data-free knowledge transfer. We exhaustively conduct ImageNet class conditional generation experiments to show that our method can successfully guide diffusion with small trainable parameters and no labeled data. Finally, we show that image classifiers, depth estimators, and semantic segmentation models can guide publicly available GLIDE through our framework in a plug-and-play manner. Our code is available at https://github.com/riiid/PPAP.

updated: Mon Mar 27 2023 13:17:48 GMT+0000 (UTC)

published: Mon Dec 12 2022 15:29:46 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト