Contrastive learning of Class-agnostic Activation Map for Weakly Supervised Object Localization and Semantic Segmentation

Jinheng Xie; Jianfeng Xiang; Junliang Chen; Xianxu Hou; Xiaodong Zhao; Linlin Shen

弱教師ありオブジェクトのローカリゼーションとセマンティックセグメンテーションのためのクラスにとらわれないアクティベーションマップの対照学習

画像分類ネットワークによって生成されたクラスアクティベーションマップ（CAM）は、弱教師ありオブジェクトローカリゼーション（WSOL）およびセマンティックセグメンテーション（WSSS）に広く使用されていますが、このような分類器は通常、識別可能なオブジェクト領域に焦点を当てています。この論文では、画像レベルの監視を伴わずに、ラベルのない画像データのみを使用して、クラスにとらわれないアクティベーションマップ（C ^ 2AM）を生成するための対照学習を提案します。核となる考えは、i）前景オブジェクトの意味情報は通常それらの背景とは異なるという観察から来ています。 ii）同様の外観または同様の色/テクスチャの背景を持つ前景オブジェクトは、フィーチャスペースで同様の表現を持ちます。上記の関係に基づいて正と負のペアを形成し、新しい対照的な損失を使用して、クラスにとらわれないアクティベーションマップを使用して、ネットワークに前景と背景を解きほぐします。ネットワークがクロスイメージの前景背景を区別するように誘導されると、私たちのアプローチによって学習されたクラスにとらわれないアクティベーションマップは、より完全なオブジェクト領域を生成します。セマンティックセグメンテーションのために分類ネットワークによって生成されたCAMを改良するために、オブジェクトのローカリゼーションと背景の手がかりのためにC^2AMクラスに依存しないオブジェクトバウンディングボックスから正常に抽出しました。 CUB-200-2011、ImageNet-1K、およびPASCAL VOC2012データセットに関する広範な実験は、WSOLとWSSSの両方が提案されたC^2AMの恩恵を受けることができることを示しています。

While class activation map (CAM) generated by image classification network has been widely used for weakly supervised object localization (WSOL) and semantic segmentation (WSSS), such classifiers usually focus on discriminative object regions. In this paper, we propose Contrastive learning for Class-agnostic Activation Map (C^2AM) generation only using unlabeled image data, without the involvement of image-level supervision. The core idea comes from the observation that i) semantic information of foreground objects usually differs from their backgrounds; ii) foreground objects with similar appearance or background with similar color/texture have similar representations in the feature space. We form the positive and negative pairs based on the above relations and force the network to disentangle foreground and background with a class-agnostic activation map using a novel contrastive loss. As the network is guided to discriminate cross-image foreground-background, the class-agnostic activation maps learned by our approach generate more complete object regions. We successfully extracted from C^2AM class-agnostic object bounding boxes for object localization and background cues to refine CAM generated by classification network for semantic segmentation. Extensive experiments on CUB-200-2011, ImageNet-1K, and PASCAL VOC2012 datasets show that both WSOL and WSSS can benefit from the proposed C^2AM.

updated: Thu Dec 22 2022 13:16:09 GMT+0000 (UTC)

published: Fri Mar 25 2022 08:46:24 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト