Encoder-based Domain Tuning for Fast Personalization of Text-to-Image Models

Rinon Gal; Moab Arar; Yuval Atzmon; Amit H. Bermano; Gal Chechik; Daniel Cohen-Or

テキストから画像へのモデルの高速パーソナライズのためのエンコーダーベースのドメインチューニング

テキストから画像へのパーソナライゼーションは、事前にトレーニングされた拡散モデルに、ユーザーが提供する斬新な概念について推論し、それらを自然言語プロンプトによって導かれる新しいシーンに埋め込むことを目的としています。ただし、現在のパーソナライゼーションのアプローチは、長いトレーニング時間、高いストレージ要件、または ID の喪失に苦しんでいます。これらの制限を克服するために、エンコーダーベースのドメインチューニングアプローチを提案します。私たちの重要な洞察は、特定のドメインからの大規模な一連の概念をアンダーフィッティングすることで、一般化を改善し、同じドメインから新しい概念をすばやく追加しやすいモデルを作成できるということです。具体的には、2 つのコンポーネントを使用します。1 つ目は、特定のドメイン (特定の顔など) からターゲットコンセプトの単一の画像を入力として受け取り、それをコンセプトを表す単語埋め込みにマッピングすることを学習するエンコーダです。 2 つ目は、追加の概念を効果的に取り込む方法を学習する、テキストから画像へのモデルの正則化された重みオフセットのセットです。これらのコンポーネントを一緒に使用して、目に見えない概念の学習をガイドし、1 つの画像とわずか 5 つのトレーニングステップを使用してモデルをパーソナライズできるようにします。品質を維持しながら、パーソナライズを数十分から数秒に加速します。

Text-to-image personalization aims to teach a pre-trained diffusion model to reason about novel, user provided concepts, embedding them into new scenes guided by natural language prompts. However, current personalization approaches struggle with lengthy training times, high storage requirements or loss of identity. To overcome these limitations, we propose an encoder-based domain-tuning approach. Our key insight is that by underfitting on a large set of concepts from a given domain, we can improve generalization and create a model that is more amenable to quickly adding novel concepts from the same domain. Specifically, we employ two components: First, an encoder that takes as an input a single image of a target concept from a given domain, e.g. a specific face, and learns to map it into a word-embedding representing the concept. Second, a set of regularized weight-offsets for the text-to-image model that learn how to effectively ingest additional concepts. Together, these components are used to guide the learning of unseen concepts, allowing us to personalize a model using only a single image and as few as 5 training steps - accelerating personalization from dozens of minutes to seconds, while preserving quality.

updated: Sun Mar 05 2023 15:48:51 GMT+0000 (UTC)

published: Thu Feb 23 2023 18:46:41 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト