Extracting Class Activation Maps from Non-Discriminative Features as well

Zhaozheng Chen; Qianru Sun

非差別的特徴からもクラス活性化マップを抽出

分類モデルからクラス活性化マップ (CAM) を抽出すると、多くの場合、前景オブジェクトのカバレッジが不十分になります。つまり、識別領域 (「羊」の「頭」など) のみが認識され、残り (「脚」など) が認識されます。「羊」の）誤って背景として。背後にある重要な点は、(CAM の計算に使用される) 分類子の重みがオブジェクトの識別機能のみをキャプチャすることです。これに対処するために、非差別的な特徴も明示的にキャプチャする CAM の新しい計算方法を導入し、CAM をオブジェクト全体をカバーするように拡張します。具体的には、分類モデルの最後のプーリングレイヤーを省略し、オブジェクトクラスのすべてのローカル機能に対してクラスタリングを実行します。ここで、「ローカル」は「空間ピクセル位置」を意味します。結果として得られる K クラスターセンターをローカルプロトタイプと呼びます。これは、「羊」の「頭」、「脚」、「体」などのローカルセマンティクスを表します。クラスの新しい画像が与えられると、そのプールされていない機能をすべてのプロトタイプと比較し、K 個の類似度行列を導出してから、それらをヒートマップ (つまり、CAM) に集約します。したがって、私たちのCAMは、クラスのすべてのローカル機能を差別なくキャプチャします。弱い教師ありセマンティックセグメンテーション (WSSS) の困難なタスクでそれを評価し、元の CAM を当社のものに置き換えるだけで、MCTformer や AMN などの複数の最先端の WSSS メソッドにプラグインします。標準の WSSS ベンチマーク (PASCAL VOC および MS COCO) での広範な実験は、計算オーバーヘッドをほとんど伴わずに一貫した改善を行うという、私たちの方法の優位性を示しています。

Extracting class activation maps (CAM) from a classification model often results in poor coverage on foreground objects, i.e., only the discriminative region (e.g., the "head" of "sheep") is recognized and the rest (e.g., the "leg" of "sheep") mistakenly as background. The crux behind is that the weight of the classifier (used to compute CAM) captures only the discriminative features of objects. We tackle this by introducing a new computation method for CAM that explicitly captures non-discriminative features as well, thereby expanding CAM to cover whole objects. Specifically, we omit the last pooling layer of the classification model, and perform clustering on all local features of an object class, where "local" means "at a spatial pixel position". We call the resultant K cluster centers local prototypes - represent local semantics like the "head", "leg", and "body" of "sheep". Given a new image of the class, we compare its unpooled features to every prototype, derive K similarity matrices, and then aggregate them into a heatmap (i.e., our CAM). Our CAM thus captures all local features of the class without discrimination. We evaluate it in the challenging tasks of weakly-supervised semantic segmentation (WSSS), and plug it in multiple state-of-the-art WSSS methods, such as MCTformer and AMN, by simply replacing their original CAM with ours. Our extensive experiments on standard WSSS benchmarks (PASCAL VOC and MS COCO) show the superiority of our method: consistent improvements with little computational overhead.

updated: Sat Mar 18 2023 04:47:42 GMT+0000 (UTC)

published: Sat Mar 18 2023 04:47:42 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト