VICTOR: Visual Incompatibility Detection with Transformers and Fashion-specific contrastive pre-training

Stefanos-Iordanis Papadopoulos; Christos Koutlis; Symeon Papadopoulos; Ioannis Kompatsiaris

VICTOR: トランスフォーマーとファッション固有の対照的な事前トレーニングによる視覚的不適合検出

ファッションの装いが美的に魅力的であると見なされるためには、それらを構成する衣服が、スタイル、カテゴリー、色などの視覚的側面に関して互換性がある必要があります.以前の作品では、視覚的な互換性を、完全に互換性があるか完全に互換性がないと見なされる衣服内のアイテムのバイナリ分類タスクとして定義していました。ただし、これは、ユーザーが独自の衣装を作成し、特定のアイテムが衣装の残りの部分と互換性がない可能性があることを知る必要がある Outfit Maker アプリケーションには適用されません。これに対処するために、2 つのタスクに最適化された Visual InCompatibility TransfORmer (VICTOR) を提案します。1) 回帰としての全体的な互換性、および 2) 不一致アイテムの検出と、コンピューターの微調整のためのファッション固有の対照的な言語イメージの事前トレーニングの利用。ファッション画像のビジョンニューラルネットワーク。 Polyvore アウトフィットベンチマークに基づいて、部分的に一致しないアウトフィットを生成し、VICTOR のトレーニングに使用される Polyvore-MISFITs と呼ばれる新しいデータセットを作成します。一連のアブレーションおよび比較分析により、提案されたアーキテクチャは、インスタンスごとのフローティング操作を 88% 削減しながら、Polyvore データセットの現在の最先端技術に匹敵し、さらには上回ることができ、高性能と効率のバランスが取れていることが示されています。 https://github.com/stevejpapad/Visual-InCompatibility-Transformer でコードをリリースします

For fashion outfits to be considered aesthetically pleasing, the garments that constitute them need to be compatible in terms of visual aspects, such as style, category and color. Previous works have defined visual compatibility as a binary classification task with items in a garment being considered as fully compatible or fully incompatible. However, this is not applicable to Outfit Maker applications where users create their own outfits and need to know which specific items may be incompatible with the rest of the outfit. To address this, we propose the Visual InCompatibility TransfORmer (VICTOR) that is optimized for two tasks: 1) overall compatibility as regression and 2) the detection of mismatching items and utilize fashion-specific contrastive language-image pre-training for fine tuning computer vision neural networks on fashion imagery. We build upon the Polyvore outfit benchmark to generate partially mismatching outfits, creating a new dataset termed Polyvore-MISFITs, that is used to train VICTOR. A series of ablation and comparative analyses show that the proposed architecture can compete and even surpass the current state-of-the-art on Polyvore datasets while reducing the instance-wise floating operations by 88%, striking a balance between high performance and efficiency. We release our code at https://github.com/stevejpapad/Visual-InCompatibility-Transformer

updated: Thu Sep 08 2022 06:58:05 GMT+0000 (UTC)

published: Wed Jul 27 2022 11:18:55 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト