PARASOL: Parametric Style Control for Diffusion Image Synthesis

Gemma Canet Tarrés; Dan Ruta; Tu Bui; John Collomosse

PARASOL: 拡散画像合成のためのパラメトリックスタイルコントロール

コンテンツときめ細かい視覚スタイルの埋め込みの両方で合成を調整することにより、画像の視覚スタイルの絡み合ったパラメトリック制御を可能にするマルチモーダル合成モデルである PARASOL を提案します。各モダリティの特定の損失を使用して潜在拡散モデル (LDM) をトレーニングし、推論時に独立したコンテンツとスタイルモダリティに対する複雑な制御を促進するために、分類子のないガイダンスを適応させます。補助的なセマンティックおよびスタイルベースの検索を活用して、LDM の監督のためのトレーニングトリプレットを作成し、コンテンツとスタイルの手がかりの補完性を確保します。 PARASOL は、画像の作成と様式化のための拡散モデルで視覚スタイルを微妙に制御できるようにすること、およびコンテンツとスタイルの記述子の両方を補間することで、テキストベースの検索結果をユーザーの意図により厳密に一致させるように適合させることができる生成的検索を可能にすることを約束します。

We propose PARASOL, a multi-modal synthesis model that enables disentangled, parametric control of the visual style of the image by jointly conditioning synthesis on both content and a fine-grained visual style embedding. We train a latent diffusion model (LDM) using specific losses for each modality and adapt the classifier-free guidance for encouraging disentangled control over independent content and style modalities at inference time. We leverage auxiliary semantic and style-based search to create training triplets for supervision of the LDM, ensuring complementarity of content and style cues. PARASOL shows promise for enabling nuanced control over visual style in diffusion models for image creation and stylization, as well as generative search where text-based search results may be adapted to more closely match user intent by interpolating both content and style descriptors.

updated: Thu May 02 2024 02:21:18 GMT+0000 (UTC)

published: Sat Mar 11 2023 17:30:36 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト