UPGPT: Universal Diffusion Model for Person Image Generation, Editing and Pose Transfer

Soon Yau Cheong; Armin Mustafa; Andrew Gilbert

UPGPT: 人物画像の生成、編集、ポーズ転送のためのユニバーサル拡散モデル

StableDiffusion などの Text-to-image モデル (T2I) は、高品質の人物画像を生成するために使用されてきました。ただし、生成プロセスのランダムな性質により、同じテキストプロンプトを使用しているにもかかわらず、人物の外観、たとえばポーズ、顔、服装が異なります。外観に一貫性がないため、T2I はポーズ転送には適していません。私たちは、テキスト、ポーズ、視覚的なプロンプトを受け入れるマルチモーダル拡散モデルを提案することで、この問題に対処します。私たちのモデルは、生成、ポーズの転送、マスクなしの編集など、人物画像のすべてのタスクを実行する最初の統一された方法です。また、私たちは、小次元の 3D ボディモデルパラメータを直接使用して、人の外観を維持しながらポーズとカメラビューを同時に補間するという新しい機能を実証する先駆者でもあります。

Text-to-image models (T2I) such as StableDiffusion have been used to generate high quality images of people. However, due to the random nature of the generation process, the person has a different appearance e.g. pose, face, and clothing, despite using the same text prompt. The appearance inconsistency makes T2I unsuitable for pose transfer. We address this by proposing a multimodal diffusion model that accepts text, pose, and visual prompting. Our model is the first unified method to perform all person image tasks - generation, pose transfer, and mask-less edit. We also pioneer using small dimensional 3D body model parameters directly to demonstrate new capability - simultaneous pose and camera view interpolation while maintaining the person's appearance.

updated: Wed Jul 26 2023 17:13:12 GMT+0000 (UTC)

published: Tue Apr 18 2023 10:05:37 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト