RegionCL: Can Simple Region Swapping Contribute to Contrastive Learning?

Yufei Xu; Qiming Zhang; Jing Zhang; Dacheng Tao

RegionCL：単純な領域交換は対照学習に貢献できますか？

自己監視方式（SSL）は、トリミングが一般的な拡張手法である2つの拡張ビュー間の相互情報量を最大化することで大きな成功を収めています。トリミングされた領域は、ポジティブペアを構築するために広く使用されていますが、トリミング後の左側の領域は、同じ画像インスタンスを構成し、両方ともカテゴリの説明に貢献しますが、既存の方法ではほとんど探索されていません。この論文では、完全な観点から作付けにおける両方の地域の重要性を実証する最初の試みを行い、地域対照学習（RegionCL）と呼ばれるシンプルで効果的な口実タスクを提案します。具体的には、2つの異なる画像が与えられた場合、同じサイズの各画像から領域（貼り付けビューと呼ばれる）をランダムにトリミングし、それらを交換して、左側の領域（キャンバスビューと呼ばれる）と一緒に2つの新しい画像をそれぞれ作成します。次に、対照ペアは、次の簡単な基準に従って効率的に構築できます。つまり、各ビューは、（1）同じ元の画像から拡張されたビューでポジティブであり、（2）他の画像から拡張されたビューでネガティブです。 RegionCLは、一般的なSSLメソッドにわずかな変更を加えるだけで、これらの豊富なペアを活用し、モデルが領域の特徴をキャンバスビューと貼り付けビューの両方から区別できるようにするため、より優れた視覚的表現を学習します。 ImageNet、MS COCO、およびCityscapesでの実験は、RegionCLがMoCo v2、DenseCL、およびSimSiamを大幅に改善し、分類、検出、およびセグメンテーションタスクで最先端のパフォーマンスを実現することを示しています。コードはhttps://github.com/Annbless/RegionCL.gitで入手できます。

Self-supervised methods (SSL) have achieved significant success via maximizing the mutual information between two augmented views, where cropping is a popular augmentation technique. Cropped regions are widely used to construct positive pairs, while the left regions after cropping have rarely been explored in existing methods, although they together constitute the same image instance and both contribute to the description of the category. In this paper, we make the first attempt to demonstrate the importance of both regions in cropping from a complete perspective and propose a simple yet effective pretext task called Region Contrastive Learning (RegionCL). Specifically, given two different images, we randomly crop a region (called the paste view) from each image with the same size and swap them to compose two new images together with the left regions (called the canvas view), respectively. Then, contrastive pairs can be efficiently constructed according to the following simple criteria, i.e., each view is (1) positive with views augmented from the same original image and (2) negative with views augmented from other images. With minor modifications to popular SSL methods, RegionCL exploits those abundant pairs and helps the model distinguish the regions features from both canvas and paste views, therefore learning better visual representations. Experiments on ImageNet, MS COCO, and Cityscapes demonstrate that RegionCL improves MoCo v2, DenseCL, and SimSiam by large margins and achieves state-of-the-art performance on classification, detection, and segmentation tasks. The code will be available at https://github.com/Annbless/RegionCL.git.

updated: Wed Nov 24 2021 07:19:46 GMT+0000 (UTC)

published: Wed Nov 24 2021 07:19:46 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト