Group-Wise Semantic Mining for Weakly Supervised Semantic Segmentation

Xueyi Li; Tianfei Zhou; Jianwu Li; Yi Zhou; Zhaoxiang Zhang

弱教師ありセマンティックセグメンテーションのためのグループワイズセマンティックマイニング

ディープラーニングはデータを大量に消費する性質があるため、ディープビジュアルモデルをトレーニングするために十分なグラウンドトゥルース監視を取得することは、長年にわたってボトルネックになっています。これは、ピクセルレベルの注釈を必要とするセマンティックセグメンテーションなどの一部の構造化された予測タスクで悪化します。この作業は、画像レベルの注釈とピクセルレベルのセグメンテーションの間のギャップを埋めることを目的として、弱教師ありセマンティックセグメンテーション（WSSS）に対処します。 WSSSは、画像のグループ内のセマンティック依存関係を明示的にモデル化して、より正確なセグメンテーションモデルのトレーニングに使用できる、より信頼性の高い疑似グラウンドトゥルースを推定する新しいグループ単位の学習タスクとして定式化します。特に、グループごとのセマンティックマイニング用のグラフニューラルネットワーク（GNN）を考案します。この場合、入力画像はグラフノードとして表され、画像のペア間の基本的な関係は、効率的な共注意メカニズムによって特徴付けられます。さらに、モデルが一般的なセマンティクスのみに過度の注意を払うのを防ぐために、グラフドロップアウトレイヤーをさらに提案し、モデルがより正確で完全なオブジェクト応答を学習するように促します。ネットワーク全体は、反復メッセージパッシングによってエンドツーエンドでトレーニング可能です。これにより、画像全体にインタラクションキューが伝播され、パフォーマンスが徐々に向上します。人気のあるPASCALVOC 2012とCOCOベンチマークで実験を行い、モデルは最先端のパフォーマンスを生み出します。コードはhttps://github.com/Lixy1997/Group-WSSSで入手できます。

Acquiring sufficient ground-truth supervision to train deep visual models has been a bottleneck over the years due to the data-hungry nature of deep learning. This is exacerbated in some structured prediction tasks, such as semantic segmentation, which requires pixel-level annotations. This work addresses weakly supervised semantic segmentation (WSSS), with the goal of bridging the gap between image-level annotations and pixel-level segmentation. We formulate WSSS as a novel group-wise learning task that explicitly models semantic dependencies in a group of images to estimate more reliable pseudo ground-truths, which can be used for training more accurate segmentation models. In particular, we devise a graph neural network (GNN) for group-wise semantic mining, wherein input images are represented as graph nodes, and the underlying relations between a pair of images are characterized by an efficient co-attention mechanism. Moreover, in order to prevent the model from paying excessive attention to common semantics only, we further propose a graph dropout layer, encouraging the model to learn more accurate and complete object responses. The whole network is end-to-end trainable by iterative message passing, which propagates interaction cues over the images to progressively improve the performance. We conduct experiments on the popular PASCAL VOC 2012 and COCO benchmarks, and our model yields state-of-the-art performance. Our code is available at: https://github.com/Lixy1997/Group-WSSS.

updated: Wed Dec 09 2020 12:40:13 GMT+0000 (UTC)

published: Wed Dec 09 2020 12:40:13 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト