Leveraging Hidden Positives for Unsupervised Semantic Segmentation

Hyun Seok Seong; WonJun Moon; SuBeen Lee; Jae-Pil Heo

教師なしセマンティックセグメンテーションの隠れたポジティブの活用

ピクセルレベルの注釈にラベルを付けるための人員に対する劇的な需要が、教師なしセマンティックセグメンテーションの出現を引き起こしました。ビジョントランスフォーマー (ViT) バックボーンを採用した最近の研究では、優れたパフォーマンスが示されていますが、タスク固有のトレーニングガイダンスとローカルのセマンティックの一貫性についてはまだ考慮されていません。これらの問題に取り組むために、隠れたポジティブな要素を発掘することで対照的な学習を活用して、豊富な意味関係を学習し、ローカル領域での意味の一貫性を確保します。具体的には、固定の事前トレーニング済みバックボーンとセグメンテーションヘッドイントレーニングによってそれぞれ定義された機能の類似性に基づいて、各アンカーのタスクに依存しないものとタスク固有のものの 2 種類のグローバルな隠れたポジティブを最初に発見します。後者の寄与が徐々に増加すると、モデルはタスク固有のセマンティック機能をキャプチャするようになります。さらに、隣接するパッチが同じセマンティクスを持つ可能性が高いという固有の前提の下で、隣接するパッチ間のセマンティックの一貫性を学習する勾配伝播戦略を導入します。具体的には、定義済みの類似性スコアに比例して、ローカルの隠れた陽性、意味的に類似した近くのパッチに伝播する損失を追加します。これらのトレーニングスキームを使用して、提案された方法は、COCO 要素、都市景観、およびポツダム 3 データセットで新しい最先端 (SOTA) の結果を達成します。私たちのコードは https://github.com/hynnsk/HP で入手できます。

Dramatic demand for manpower to label pixel-level annotations triggered the advent of unsupervised semantic segmentation. Although the recent work employing the vision transformer (ViT) backbone shows exceptional performance, there is still a lack of consideration for task-specific training guidance and local semantic consistency. To tackle these issues, we leverage contrastive learning by excavating hidden positives to learn rich semantic relationships and ensure semantic consistency in local regions. Specifically, we first discover two types of global hidden positives, task-agnostic and task-specific ones for each anchor based on the feature similarities defined by a fixed pre-trained backbone and a segmentation head-in-training, respectively. A gradual increase in the contribution of the latter induces the model to capture task-specific semantic features. In addition, we introduce a gradient propagation strategy to learn semantic consistency between adjacent patches, under the inherent premise that nearby patches are highly likely to possess the same semantics. Specifically, we add the loss propagating to local hidden positives, semantically similar nearby patches, in proportion to the predefined similarity scores. With these training schemes, our proposed method achieves new state-of-the-art (SOTA) results in COCO-stuff, Cityscapes, and Potsdam-3 datasets. Our code is available at: https://github.com/hynnsk/HP.

updated: Mon Mar 27 2023 08:57:28 GMT+0000 (UTC)

published: Mon Mar 27 2023 08:57:28 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト