Partial Visual-Semantic Embedding: Fashion Intelligence System with Sensitive Part-by-Part Learning

Ryotaro Shimizu; Takuma Nakamura; Masayuki Goto

部分的な視覚的意味の埋め込み: 敏感な部分ごとの学習によるファッションインテリジェンスシステム

本研究では、「カジュアル」「大人カジュアル」「大人カジュアル」といったファッション特有の抽象的で複雑な表現を定量化するために、視覚的セマンティック埋め込み (VSE) モデルに基づくファッションインテリジェンスシステムと呼ばれる技術を提案します。ユーザーのファッション理解をサポートする「オフィスカジュアル」。ただし、既存の VSE モデルでは、髪、トップス、パンツ、スカート、靴など、複数のパーツで画像が構成されている状況に対応していません。ファッションコーディネートの部位ごとに敏感に学習できる部分VSEを提案する．提案されたモデルは、埋め込まれた表現を部分的に学習します。これにより、既存のさまざまな実用的な機能を維持しながら、特定の部分だけを変更する画像検索タスクや、特定の部分に焦点を当てた画像の並べ替えタスクを実行できます。これは、従来のモデルでは不可能でした。定性的評価実験と定量的評価実験の両方に基づいて、計算量を増やすことなく、提案モデルが従来のモデルよりも優れていることを示します。

In this study, we propose a technology called the Fashion Intelligence System based on the visual-semantic embedding (VSE) model to quantify abstract and complex expressions unique to fashion, such as ''casual,'' ''adult-casual,'' and ''office-casual,'' and to support users' understanding of fashion. However, the existing VSE model does not support the situations in which the image is composed of multiple parts such as hair, tops, pants, skirts, and shoes. We propose partial VSE, which enables sensitive learning for each part of the fashion coordinates. The proposed model partially learns embedded representations. This helps retain the various existing practical functionalities and enables image-retrieval tasks in which changes are made only to the specified parts and image reordering tasks that focus on the specified parts. This was not possible with conventional models. Based on both the qualitative and quantitative evaluation experiments, we show that the proposed model is superior to conventional models without increasing the computational complexity.

updated: Sat Nov 12 2022 15:36:14 GMT+0000 (UTC)

published: Sat Nov 12 2022 15:36:14 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト