Token Contrast for Weakly-Supervised Semantic Segmentation

Lixiang Ru; Heliang Zheng; Yibing Zhan; Bo Du

弱い教師ありセマンティックセグメンテーションのトークンコントラスト

画像レベルのラベルを使用する弱い教師ありセマンティックセグメンテーション (WSSS) では、通常、クラスアクティベーションマップ (CAM) を利用して疑似ラベルを生成します。 CNN の局所的な構造認識によって制限されるため、CAM は通常、不可欠なオブジェクト領域を識別できません。最近の Vision Transformer (ViT) はこの欠陥を修正することができますが、過剰な平滑化の問題、つまり、最終的なパッチトークンが均一になる傾向があることも確認しています。この作業では、この問題に対処し、WSSS の ViT の利点をさらに調査するために、トークンコントラスト (ToCo) を提案します。まず、ViT の中間層がセマンティックの多様性を維持できるという観察に基づいて、Patch Token Contrast モジュール (PTC) を設計しました。 PTC は、中間層から派生した疑似トークン関係を使用して最終的なパッチトークンを監視し、セマンティック領域を整列させ、より正確な CAM を生成できるようにします。次に、CAM の信頼度の低い領域をさらに区別するために、ViT のクラストークンが高レベルのセマンティクスをキャプチャできるという事実に着想を得て、クラストークンコントラストモジュール (CTC) を考案しました。 CTC は、クラストークンを対比することにより、不確実なローカル領域とグローバルオブジェクト間の表現の一貫性を促進します。 PASCAL VOC および MS COCO データセットに関する実験は、提案された ToCo が他の単一段階の競合他社を著しく凌駕し、最先端の多段階法で同等のパフォーマンスを達成できることを示しています。コードは https://github.com/rulixiang/ToCo で入手できます。

Weakly-Supervised Semantic Segmentation (WSSS) using image-level labels typically utilizes Class Activation Map (CAM) to generate the pseudo labels. Limited by the local structure perception of CNN, CAM usually cannot identify the integral object regions. Though the recent Vision Transformer (ViT) can remedy this flaw, we observe it also brings the over-smoothing issue, i.e. , the final patch tokens incline to be uniform. In this work, we propose Token Contrast (ToCo) to address this issue and further explore the virtue of ViT for WSSS. Firstly, motivated by the observation that intermediate layers in ViT can still retain semantic diversity, we designed a Patch Token Contrast module (PTC). PTC supervises the final patch tokens with the pseudo token relations derived from intermediate layers, allowing them to align the semantic regions and thus yield more accurate CAM. Secondly, to further differentiate the low-confidence regions in CAM, we devised a Class Token Contrast module (CTC) inspired by the fact that class tokens in ViT can capture high-level semantics. CTC facilitates the representation consistency between uncertain local regions and global objects by contrasting their class tokens. Experiments on the PASCAL VOC and MS COCO datasets show the proposed ToCo can remarkably surpass other single-stage competitors and achieve comparable performance with state-of-the-art multi-stage methods. Code is available at https://github.com/rulixiang/ToCo.

updated: Thu Mar 02 2023 13:51:58 GMT+0000 (UTC)

published: Thu Mar 02 2023 13:51:58 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト