Multi-View Correlation Consistency for Semi-Supervised Semantic Segmentation

Yunzhong Hou; Stephen Gould; Liang Zheng

半教師ありセマンティックセグメンテーションのためのマルチビュー相関の一貫性

半教師付きセマンティックセグメンテーションには、ラベルのないデータに対する豊富で堅牢な監視が必要です。一貫性学習は、同じピクセルが異なる拡張ビューで同様の特徴を持つように強制します。これは堅牢な信号ですが、他のピクセルとの関係は無視されます。対照的に、対照学習では豊富なペアワイズ関係が考慮されますが、ピクセルペアにバイナリの正負監視信号を割り当てることは難しい問題になる可能性があります。このホワイトペーパーでは、両方の長所を生かし、マルチビュー相関整合性 (MVCC) 学習を提案します。MVCC は、自己相関行列の豊富なペアワイズ関係を考慮し、それらをビュー全体で照合して、堅牢な監視を提供します。この相関の一貫性の喪失とともに、異なるビュー間のピクセル間の対応を保証するビューコヒーレントデータ拡張戦略を提案します。 2 つのデータセットの一連の半教師付き設定で、最先端の方法と比較して競争力のある精度を報告します。特に、Cityscapes では、1/8 のラベル付きデータで 76.8% の mIoU を達成しており、完全に監視されたオラクルからわずか 0.6% の差です。

Semi-supervised semantic segmentation needs rich and robust supervision on unlabeled data. Consistency learning enforces the same pixel to have similar features in different augmented views, which is a robust signal but neglects relationships with other pixels. In comparison, contrastive learning considers rich pairwise relationships, but it can be a conundrum to assign binary positive-negative supervision signals for pixel pairs. In this paper, we take the best of both worlds and propose multi-view correlation consistency (MVCC) learning: it considers rich pairwise relationships in self-correlation matrices and matches them across views to provide robust supervision. Together with this correlation consistency loss, we propose a view-coherent data augmentation strategy that guarantees pixel-pixel correspondence between different views. In a series of semi-supervised settings on two datasets, we report competitive accuracy compared with the state-of-the-art methods. Notably, on Cityscapes, we achieve 76.8% mIoU with 1/8 labeled data, just 0.6% shy from the fully supervised oracle.

updated: Wed Aug 17 2022 17:59:11 GMT+0000 (UTC)

published: Wed Aug 17 2022 17:59:11 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト