CompoDiff: Versatile Composed Image Retrieval With Latent Diffusion

Geonmo Gu; Sanghyuk Chun; Wonjae Kim; HeeJae Jun; Yoohoon Kang; Sangdoo Yun

CompoDiff: 潜在拡散による汎用合成画像検索

この論文では、潜在的な拡散を使用して合成画像検索 (CIR) を解決するための新しい拡散ベースのモデル CompoDiff を提案し、モデルをトレーニングするための 1,800 万の参照画像、条件、および対応するターゲット画像トリプレットの新しく作成されたデータセットを提示します。 CompoDiff は、FashionIQ などの CIR ベンチマークで新しいゼロショットの最先端を達成するだけでなく、既存の CIR では利用できないネガティブテキストやイメージマスク条件などのさまざまな条件を受け入れることで、より汎用性の高い CIR を可能にします。メソッド。さらに、CompoDiff 機能はそのままの CLIP 埋め込みスペースにあるため、CLIP スペースを利用する既存のすべてのモデルに直接使用できます。トレーニングに使用されるコードとデータセット、およびトレーニング済みの重みは、https://github.com/navervision/CompoDiff で入手できます。

This paper proposes a novel diffusion-based model, CompoDiff, for solving Composed Image Retrieval (CIR) with latent diffusion and presents a newly created dataset of 18 million reference images, conditions, and corresponding target image triplets to train the model. CompoDiff not only achieves a new zero-shot state-of-the-art on a CIR benchmark such as FashionIQ but also enables a more versatile CIR by accepting various conditions, such as negative text and image mask conditions, which are unavailable with existing CIR methods. In addition, the CompoDiff features are on the intact CLIP embedding space so that they can be directly used for all existing models exploiting the CLIP space. The code and dataset used for the training, and the pre-trained weights are available at https://github.com/navervision/CompoDiff

updated: Tue Mar 21 2023 15:06:35 GMT+0000 (UTC)

published: Tue Mar 21 2023 15:06:35 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト