More Photos are All You Need: Semi-Supervised Learning for Fine-Grained Sketch Based Image Retrieval

Ayan Kumar Bhunia; Pinaki Nath Chowdhury; Aneeshan Sain; Yongxin Yang; Tao Xiang; Yi-Zhe Song

より多くの写真が必要なすべてです：きめ細かいスケッチベースの画像検索のための半教師あり学習

既存の細粒度スケッチベースの画像検索（FG-SBIR）モデルが直面する基本的な課題は、データの不足です。モデルのパフォーマンスは、スケッチと写真のペアがないために大きくボトルネックになっています。写真の数は簡単に拡大縮小できますが、対応する各スケッチを個別に作成する必要があります。本稿では、このようなスケッチデータの上限を緩和し、ラベルのない写真（多く）だけでパフォーマンスを向上させることができるかどうかを検討します。特に、データの不足を説明するために大規模なラベルのない写真をさらに活用できるクロスモーダル検索のための新しい半教師ありフレームワークを紹介します。半教師あり設計の中心にあるのは、ラベルのない写真のペアスケッチを生成することを目的とした、写真からスケッチへの順次生成モデルです。重要なのは、不忠実な生成を防ぐための弁別器誘導メカニズムと、ノイズの多いトレーニングサンプルに対する耐性を提供する蒸留損失ベースの正則化をさらに導入することです。大事なことを言い忘れましたが、私たちは生成と検索を2つの共役問題として扱います。そこでは、モジュールごとに共同学習手順が考案され、相互に利益を得ることができます。広範な実験により、半教師ありモデルは、最先端の教師あり代替案や、FG-SBIRのラベルなし写真を活用できる既存の方法よりも大幅にパフォーマンスが向上することが示されています。

A fundamental challenge faced by existing Fine-Grained Sketch-Based Image Retrieval (FG-SBIR) models is the data scarcity -- model performances are largely bottlenecked by the lack of sketch-photo pairs. Whilst the number of photos can be easily scaled, each corresponding sketch still needs to be individually produced. In this paper, we aim to mitigate such an upper-bound on sketch data, and study whether unlabelled photos alone (of which they are many) can be cultivated for performances gain. In particular, we introduce a novel semi-supervised framework for cross-modal retrieval that can additionally leverage large-scale unlabelled photos to account for data scarcity. At the centre of our semi-supervision design is a sequential photo-to-sketch generation model that aims to generate paired sketches for unlabelled photos. Importantly, we further introduce a discriminator guided mechanism to guide against unfaithful generation, together with a distillation loss based regularizer to provide tolerance against noisy training samples. Last but not least, we treat generation and retrieval as two conjugate problems, where a joint learning procedure is devised for each module to mutually benefit from each other. Extensive experiments show that our semi-supervised model yields significant performance boost over the state-of-the-art supervised alternatives, as well as existing methods that can exploit unlabelled photos for FG-SBIR.

updated: Thu Mar 25 2021 17:27:08 GMT+0000 (UTC)

published: Thu Mar 25 2021 17:27:08 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト