CP2: Copy-Paste Contrastive Pretraining for Semantic Segmentation

Feng Wang; Huiyu Wang; Chen Wei; Alan Yuille; Wei Shen

CP2：セマンティックセグメンテーションのためのコピーアンドペースト対照事前トレーニング

自己監視型対照学習の最近の進歩により、優れた画像レベルの表現が得られます。これは、分類タスクを優先しますが、通常はピクセルレベルの詳細情報を無視するため、セマンティックセグメンテーションなどの高密度予測タスクへの転送パフォーマンスが不十分になります。この作業では、CP2（Copy-Paste Contrastive Pretraining）と呼ばれるピクセル単位の対照学習方法を提案します。これは、画像レベルとピクセルレベルの両方の表現学習を容易にするため、下流の高密度予測タスクにより適しています。詳細には、画像（前景）からランダムな切り抜きをコピーして別の背景画像に貼り付け、1）前景ピクセルと背景ピクセルを区別し、2）合成された画像を特定することを目的として、セマンティックセグメンテーションモデルを事前トレーニングします。同じフォアグラウンドを共有します。実験は、ダウンストリームセマンティックセグメンテーションにおけるCP2の強力なパフォーマンスを示しています。PASCALVOC2012でCP2事前トレーニング済みモデルを微調整することにより、ResNet-50で78.6％mIoU、ViT-Sで79.5％を取得します。

Recent advances in self-supervised contrastive learning yield good image-level representation, which favors classification tasks but usually neglects pixel-level detailed information, leading to unsatisfactory transfer performance to dense prediction tasks such as semantic segmentation. In this work, we propose a pixel-wise contrastive learning method called CP2 (Copy-Paste Contrastive Pretraining), which facilitates both image- and pixel-level representation learning and therefore is more suitable for downstream dense prediction tasks. In detail, we copy-paste a random crop from an image (the foreground) onto different background images and pretrain a semantic segmentation model with the objective of 1) distinguishing the foreground pixels from the background pixels, and 2) identifying the composed images that share the same foreground.Experiments show the strong performance of CP2 in downstream semantic segmentation: By finetuning CP2 pretrained models on PASCAL VOC 2012, we obtain 78.6% mIoU with a ResNet-50 and 79.5% with a ViT-S.

updated: Tue Mar 22 2022 13:21:49 GMT+0000 (UTC)

published: Tue Mar 22 2022 13:21:49 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト