Positive-Negative Equal Contrastive Loss for Semantic Segmentation

Jing Wang; Lingfei Xuan; Wenxuan Wang; Tianxiang Zhang; Jiangyun Li

セマンティックセグメンテーションのための正負の等しい対照的損失

コンテキスト情報は、さまざまなコンピュータービジョンタスクにとって重要です。以前の研究では、一般的にプラグアンドプレイモジュールと構造的損失を設計して、グローバルコンテキストを効果的に抽出および集約しています。これらの方法では、ファインラベルを使用してモデルを最適化しますが、ファイントレーニングされた特徴も貴重なトレーニングリソースであり、ハードピクセル (誤分類されたピクセル) に適切な分布を導入する可能性があることを無視しています。教師なしパラダイムの対照的学習に着想を得て、教師付き方法で対照的損失を適用し、損失関数を再設計して、教師なし学習の固定観念を捨てます (例: ポジティブとネガティブの不均衡、アンカーコンピューティングの混乱)。この目的のために、正と負の等しいコントラスト損失 (PNE 損失) を提案します。これは、アンカーへの正の埋め込みの潜在的な影響を増加させ、正と負のサンプルペアを等しく扱います。 PNE 損失は、既存のセマンティックセグメンテーションフレームワークに直接差し込むことができ、無視できるほどの余分な計算コストで優れたパフォーマンスを実現します。いくつかの古典的なセグメンテーション手法 (DeepLabV3、HRNetV2、OCRNet、UperNet など) とバックボーン (ResNet、HRNet、Swin Transformer など) を利用して、包括的な実験を行い、3 つのベンチマークデータセットで最先端のパフォーマンスを達成します (例: Cityscapes、COCO-Stuff、ADE20K)。私たちのコードはまもなく公開されます。

The contextual information is critical for various computer vision tasks, previous works commonly design plug-and-play modules and structural losses to effectively extract and aggregate the global context. These methods utilize fine-label to optimize the model but ignore that fine-trained features are also precious training resources, which can introduce preferable distribution to hard pixels (i.e., misclassified pixels). Inspired by contrastive learning in unsupervised paradigm, we apply the contrastive loss in a supervised manner and re-design the loss function to cast off the stereotype of unsupervised learning (e.g., imbalance of positives and negatives, confusion of anchors computing). To this end, we propose Positive-Negative Equal contrastive loss (PNE loss), which increases the latent impact of positive embedding on the anchor and treats the positive as well as negative sample pairs equally. The PNE loss can be directly plugged right into existing semantic segmentation frameworks and leads to excellent performance with neglectable extra computational costs. We utilize a number of classic segmentation methods (e.g., DeepLabV3, HRNetV2, OCRNet, UperNet) and backbone (e.g., ResNet, HRNet, Swin Transformer) to conduct comprehensive experiments and achieve state-of-the-art performance on three benchmark datasets (e.g., Cityscapes, COCO-Stuff and ADE20K). Our code will be publicly available soon.

updated: Thu Sep 08 2022 08:07:54 GMT+0000 (UTC)

published: Mon Jul 04 2022 13:51:29 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト