TD-GEM: Text-Driven Garment Editing Mapper

Reza Dadfar; Sanaz Sabzevari; Mårten Björkman; Danica Kragic

TD-GEM: テキスト駆動の衣服編集マッパー

言語ベースのファッション画像編集により、ユーザーは提供されたテキストプロンプトを通じて希望の衣服のバリエーションを試すことができます。 StyleCLIP と HairCLIP の潜在表現の操作に関する研究に触発され、私たちは全身人間データセットのファッションアイテムを編集するためのこれらの潜在空間に焦点を当てています。現在、衣服の形状や質感の複雑さ、人間のポーズの多様性により、ファッション画像編集の処理にはギャップがあります。本稿では、ファッションアイテムを解きほぐす方法で編集することを目的として、Text-Driven Garment Editing Mapper (TD-GEM) と呼ばれる編集オプティマイザースキーム手法を提案します。この目的を達成するために、より正確な結果を得るために、最初に Encoder for Editing (e4e) や Pivotal Tuning Inversion (PTI) などの敵対的生成ネットワーク反転を通じて画像の潜在表現を取得します。次に、最適化ベースの対照言語イメージ事前トレーニング (CLIP) を利用して、テキストプロンプトで表現されるターゲット属性の方向にファッションイメージの潜在表現をガイドします。当社の TD-GEM は、画像の他の部分はそのままにしながら、ターゲットの属性に従って画像を正確に操作します。実験では、最近の操作スキームと比較してリアルな画像を効果的に生成する 2 つの異なる属性 (つまり、「色」と「袖の長さ」) について TD-GEM を評価します。

Language-based fashion image editing allows users to try out variations of desired garments through provided text prompts. Inspired by research on manipulating latent representations in StyleCLIP and HairCLIP, we focus on these latent spaces for editing fashion items of full-body human datasets. Currently, there is a gap in handling fashion image editing due to the complexity of garment shapes and textures and the diversity of human poses. In this paper, we propose an editing optimizer scheme method called Text-Driven Garment Editing Mapper (TD-GEM), aiming to edit fashion items in a disentangled way. To this end, we initially obtain a latent representation of an image through generative adversarial network inversions such as Encoder for Editing (e4e) or Pivotal Tuning Inversion (PTI) for more accurate results. An optimization-based Contrasive Language-Image Pre-training (CLIP) is then utilized to guide the latent representation of a fashion image in the direction of a target attribute expressed in terms of a text prompt. Our TD-GEM manipulates the image accurately according to the target attribute, while other parts of the image are kept untouched. In the experiments, we evaluate TD-GEM on two different attributes (i.e., "color" and "sleeve length"), which effectively generates realistic images compared to the recent manipulation schemes.

updated: Mon May 29 2023 14:31:54 GMT+0000 (UTC)

published: Mon May 29 2023 14:31:54 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト