Semantic Granularity Metric Learning for Visual Search

Dipu Manandhar; Muhammet Bastan; Kim-Hui Yap

視覚的検索のための意味的粒度メトリック学習

さまざまなアプリケーションに適用されるディープメトリックラーニングは、識別、検索、および認識において有望な結果を示しています。既存の方法では、多くの場合、視覚的な類似性の異なる粒度を考慮していません。ただし、多くのドメインアプリケーションでは、画像は視覚的なセマンティックコンセプトと複数の粒度で類似性を示します。ファッションは、まったく同じインスタンスの衣服から、類似の外観/デザイン、または共通のカテゴリーに至るまでの類似性を示しています。したがって、メトリック学習に使用されるトレーニング画像のトリプレット/ペアは、本質的に異なる程度の情報を所有しています。ただし、既存の方法では、トレーニング中にそれらを同じくらい重要に扱うことがよくあります。これは、効果的な視覚検索に必要な機能の類似性の基礎となる粒度をキャプチャすることを妨げます。これを考慮して、属性の意味空間を活用して類似性の異なる粒度をキャプチャするという新しいアイデアを開発し、この情報をディープメトリック学習に統合する新しいディープセマンティック粒度メトリック学習（SGML）を提案します。提案された方法は、マルチタスクCNNを使用して画像属性と埋め込みを同時に学習します。 2つのタスクは、共同で最適化されるだけでなく、セマンティック粒度の類似性マッピングによってさらにリンクされ、タスク間の相関を活用します。この目的のために、トレーニングサンプルの情報の度合いを効果的に統合する新しいソフト二項偏差損失を提案します。これは、複数の粒度で視覚的な類似性をキャプチャするのに役立ちます。最近のアンサンブルベースの方法と比較して、私たちのフレームワークは概念的にエレガントで、計算がシンプルで、パフォーマンスが向上しています。ベンチマークメトリック学習データセットで広範な実験を実行し、このメソッドが最新のメソッドよりも優れていることを実証します。たとえば、Recall @ 1は以前の最新のメソッドよりも1〜4.5％改善されます[1]、 [2] DeepFashion In-Shopデータセット。

Deep metric learning applied to various applications has shown promising results in identification, retrieval and recognition. Existing methods often do not consider different granularity in visual similarity. However, in many domain applications, images exhibit similarity at multiple granularities with visual semantic concepts, e.g. fashion demonstrates similarity ranging from clothing of the exact same instance to similar looks/design or a common category. Therefore, training image triplets/pairs used for metric learning inherently possess different degree of information. However, the existing methods often treats them with equal importance during training. This hinders capturing the underlying granularities in feature similarity required for effective visual search. In view of this, we propose a new deep semantic granularity metric learning (SGML) that develops a novel idea of leveraging attribute semantic space to capture different granularity of similarity, and then integrate this information into deep metric learning. The proposed method simultaneously learns image attributes and embeddings using multitask CNNs. The two tasks are not only jointly optimized but are further linked by the semantic granularity similarity mappings to leverage the correlations between the tasks. To this end, we propose a new soft-binomial deviance loss that effectively integrates the degree of information in training samples, which helps to capture visual similarity at multiple granularities. Compared to recent ensemble-based methods, our framework is conceptually elegant, computationally simple and provides better performance. We perform extensive experiments on benchmark metric learning datasets and demonstrate that our method outperforms recent state-of-the-art methods, e.g., 1-4.5% improvement in Recall@1 over the previous state-of-the-arts [1],[2] on DeepFashion In-Shop dataset.

updated: Thu Nov 14 2019 11:36:16 GMT+0000 (UTC)

published: Thu Nov 14 2019 11:36:16 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト