Beyond Single Instance Multi-view Unsupervised Representation Learning

Xiangxiang Chu; Xiaohang Zhan; Xiaolin Wei

シングルインスタンスマルチビュー教師なし表現学習を超えて

最近の教師なし対照表現学習は、シングルインスタンスマルチビュー（SIM）パラダイムに従います。このパラダイムでは、通常、画像内データの拡張によってポジティブペアが構築されます。この論文では、Beyond Single Instance Multi-view（BSIM）と呼ばれる効果的なアプローチを提案します。具体的には、ランダムにサンプリングされた2つのインスタンスとそれらの混合物、つまりスプリアス陽性ペアの間の結合の類似性を測定することにより、より正確なインスタンス識別機能を課します。ジョイントの類似性を学習することで、エンコードされた特徴が潜在空間でより均等に分散されるときにパフォーマンスが向上すると考えています。現在の優れた方法であるSimCLR、MoCo、BYOLなど、教師なし対照表現学習の直交改善として適用します。 ImageNet-1kとPASCALVOC 2007での線形分類、MS COCO 2017とVOCでのオブジェクト検出など、多くのダウンストリームベンチマークで学習した表現を評価します。これらのタスクのほとんどすべてで、従来の技術と比較して大幅な利益が得られます。

Recent unsupervised contrastive representation learning follows a Single Instance Multi-view (SIM) paradigm where positive pairs are usually constructed with intra-image data augmentation. In this paper, we propose an effective approach called Beyond Single Instance Multi-view (BSIM). Specifically, we impose more accurate instance discrimination capability by measuring the joint similarity between two randomly sampled instances and their mixture, namely spurious-positive pairs. We believe that learning joint similarity helps to improve the performance when encoded features are distributed more evenly in the latent space. We apply it as an orthogonal improvement for unsupervised contrastive representation learning, including current outstanding methods SimCLR, MoCo, and BYOL. We evaluate our learned representations on many downstream benchmarks like linear classification on ImageNet-1k and PASCAL VOC 2007, object detection on MS COCO 2017 and VOC, etc. We obtain substantial gains with a large margin almost on all these tasks compared with prior arts.

updated: Thu Nov 26 2020 15:43:27 GMT+0000 (UTC)

published: Thu Nov 26 2020 15:43:27 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト