Dense Contrastive Learning for Self-Supervised Visual Pre-Training

Xinlong Wang; Rufeng Zhang; Chunhua Shen; Tao Kong; Lei Li

自己教師あり視覚事前トレーニングのための高密度対照学習

現在まで、ほとんどの既存の自己教師あり学習方法は、画像分類用に設計および最適化されています。これらの事前トレーニング済みモデルは、画像レベルの予測とピクセルレベルの予測の間に不一致があるため、高密度の予測タスクには最適ではない可能性があります。このギャップを埋めるために、局所特徴間の対応を考慮して、ピクセル（または局所特徴）のレベルで直接機能する効果的で高密度の自己教師あり学習方法を設計することを目指しています。入力画像の2つのビュー間のピクセルレベルでペアワイズ対照（非）類似性損失を最適化することにより、自己教師あり学習を実装する高密度対照学習を提示します。ベースラインメソッドMoCo-v2と比較して、このメソッドは無視できる計算オーバーヘッドを導入しますが（1％未満遅い）、オブジェクト検出、セマンティックセグメンテーション、インスタンスセグメンテーションなどのダウンストリームの高密度予測タスクに転送するときに一貫して優れたパフォーマンスを示します。最先端の方法を大幅に上回っています。具体的には、強力なMoCo-v2ベースラインを超えて、PASCAL VOCオブジェクト検出で2.0％AP、COCOオブジェクト検出で1.1％AP、COCOインスタンスセグメンテーションで0.9％AP、PASCAL VOCセマンティックセグメンテーションで3.0％mIoUの大幅な改善を達成しています。 Cityscapesセマンティックセグメンテーションで1.8％mIoU。コードはhttps://git.io/AdelaiDetで入手できます。

To date, most existing self-supervised learning methods are designed and optimized for image classification. These pre-trained models can be sub-optimal for dense prediction tasks due to the discrepancy between image-level prediction and pixel-level prediction. To fill this gap, we aim to design an effective, dense self-supervised learning method that directly works at the level of pixels (or local features) by taking into account the correspondence between local features. We present dense contrastive learning, which implements self-supervised learning by optimizing a pairwise contrastive (dis)similarity loss at the pixel level between two views of input images. Compared to the baseline method MoCo-v2, our method introduces negligible computation overhead (only <1% slower), but demonstrates consistently superior performance when transferring to downstream dense prediction tasks including object detection, semantic segmentation and instance segmentation; and outperforms the state-of-the-art methods by a large margin. Specifically, over the strong MoCo-v2 baseline, our method achieves significant improvements of 2.0% AP on PASCAL VOC object detection, 1.1% AP on COCO object detection, 0.9% AP on COCO instance segmentation, 3.0% mIoU on PASCAL VOC semantic segmentation and 1.8% mIoU on Cityscapes semantic segmentation. Code is available at: https://git.io/AdelaiDet

updated: Wed Nov 18 2020 08:42:32 GMT+0000 (UTC)

published: Wed Nov 18 2020 08:42:32 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト