Modality-Aware Triplet Hard Mining for Zero-shot Sketch-Based Image Retrieval

Zongheng Huang; YiFan Sun; Chuchu Han; Changxin Gao; Nong Sang

ゼロショットスケッチベースの画像検索のためのモダリティ対応トリプレットハードマイニング

この論文は、クロスモダリティメトリック学習の観点からゼロショットスケッチベースの画像検索（ZS-SBIR）問題に取り組んでいます。％ディープメトリック学習の最近のグッドプラクティス。このタスクには2つの特徴があります。1）ゼロショット設定では、新しいクラスを認識するために、クラス内のコンパクトさとクラス間の不一致が良好な距離空間が必要です。2）スケッチクエリとフォトギャラリーは異なるモダリティにあります。メトリック学習の観点は、2つの側面からZS-SBIRに利益をもたらします。まず、ディープメトリックラーニング（DML）の最近のグッドプラクティスを通じて改善を促進します。分類トレーニングとペアワイズトレーニングなど、DMLの2つの基本的な学習アプローチを組み合わせることにより、ZS-SBIRの強力なベースラインを設定しました。ベルやホイッスルがない場合、このベースラインは競争力のある検索精度を実現します。第二に、モダリティギャップを適切に抑制することが重要であるという洞察を提供します。この目的のために、Modality-Aware Triplet Hard Mining（MATHM）という名前の新しいメソッドを設計します。 MATHMは、クロスモダリティサンプルペア、モダリティ内サンプルペア、およびそれらの組み合わせなど、3種類のペアワイズ学習でベースラインを強化します。また、トレーニング中にこれら3つのコンポーネントのバランスを動的にとる適応型重み付け方法を設計します。実験結果は、MATHMが強力なベースラインに基づいて大幅な改善をもたらし、新しい最先端のパフォーマンスを確立することを確認しています。たとえば、TU-Berlinデータセットでは、47.88 + 2.94％mAP @allと58.28+ 2.34％Prec@100を達成しています。コードは公開されます。

This paper tackles the Zero-Shot Sketch-Based Image Retrieval (ZS-SBIR) problem from the viewpoint of cross-modality metric learning. % with recent good practices in deep metric learning. This task has two characteristics: 1) the zero-shot setting requires a metric space with good within-class compactness and the between-class discrepancy for recognizing the novel classes and 2) the sketch query and the photo gallery are in different modalities. The metric learning viewpoint benefits ZS-SBIR from two aspects. First, it facilitates improvement through recent good practices in deep metric learning (DML). By combining two fundamental learning approaches in DML, e.g., classification training and pairwise training, we set up a strong baseline for ZS-SBIR. Without bells and whistles, this baseline achieves competitive retrieval accuracy. Second, it provides an insight that properly suppressing the modality gap is critical. To this end, we design a novel method named Modality-Aware Triplet Hard Mining (MATHM). MATHM enhances the baseline with three types of pairwise learning, e.g., a cross-modality sample pair, a within-modality sample pair, and their combination.\We also design an adaptive weighting method to balance these three components during training dynamically. Experimental results confirm that MATHM brings another round of significant improvement based on the strong baseline and sets up new state-of-the-art performance. For example, on the TU-Berlin dataset, we achieve 47.88+2.94% mAP@all and 58.28+2.34% Prec@100. Code will be publicly available.

updated: Wed Dec 15 2021 08:36:44 GMT+0000 (UTC)

published: Wed Dec 15 2021 08:36:44 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト