Attention-Guided Supervised Contrastive Learning for Semantic Segmentation

Ho Hin Lee; Yucheng Tang; Qi Yang; Xin Yu; Shunxing Bao; Bennett A. Landman; Yuankai Huo

セマンティックセグメンテーションのための注意ガイド付き教師あり対照学習

対比学習は、コンピュータービジョン (画像分類など) にグローバルおよび空間不変機能を埋め込む際に優れたパフォーマンスを示しています。ただし、特にセマンティックセグメンテーションの場合、ローカルおよび空間バリアント機能を埋め込む全体的な成功はまだ限られています。ピクセルごとの予測タスクでは、セグメンテーションのために単一の画像に複数のラベルが存在する可能性があります (たとえば、画像に猫、犬、および草の両方が含まれる)。標準的な対照的な学習設定。この論文では、ターゲットとして毎回単一の意味オブジェクトを強調するための注意ガイド付き教師あり対照学習アプローチを提案します。私たちの設計では、追加の入力チャネルとしてセマンティックアテンション (つまり、セマンティックマスクの強制) を使用して、同じ画像を異なるセマンティッククラスターに埋め込むことができます。このような注目を得るために、新しい 2 段階のトレーニング戦略が提示されます。社内データと BTCV 2015 データセットの両方を使用して、主要なタスクとして多臓器医療画像セグメンテーションタスクに関する提案方法を評価します。 ResNet-50 のバックボーンにある最先端の教師ありおよび半教師ありトレーニングと比較すると、提案されたパイプラインは、両方の医療画像セグメンテーションコホートの Dice スコアでそれぞれ 5.53% と 6.09% の大幅な改善をもたらします。自然画像に対する提案された方法のパフォーマンスは、PASCAL VOC 2012 データセットを介して評価され、2.75% の大幅な改善を達成しています。

Contrastive learning has shown superior performance in embedding global and spatial invariant features in computer vision (e.g., image classification). However, its overall success of embedding local and spatial variant features is still limited, especially for semantic segmentation. In a per-pixel prediction task, more than one label can exist in a single image for segmentation (e.g., an image contains both cat, dog, and grass), thereby it is difficult to define 'positive' or 'negative' pairs in a canonical contrastive learning setting. In this paper, we propose an attention-guided supervised contrastive learning approach to highlight a single semantic object every time as the target. With our design, the same image can be embedded to different semantic clusters with semantic attention (i.e., coerce semantic masks) as an additional input channel. To achieve such attention, a novel two-stage training strategy is presented. We evaluate the proposed method on multi-organ medical image segmentation task, as our major task, with both in-house data and BTCV 2015 datasets. Comparing with the supervised and semi-supervised training state-of-the-art in the backbone of ResNet-50, our proposed pipeline yields substantial improvement of 5.53% and 6.09% in Dice score for both medical image segmentation cohorts respectively. The performance of the proposed method on natural images is assessed via PASCAL VOC 2012 dataset, and achieves 2.75% substantial improvement.

updated: Thu Jun 03 2021 05:01:11 GMT+0000 (UTC)

published: Thu Jun 03 2021 05:01:11 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト