PiPa: Pixel- and Patch-wise Self-supervised Learning for Domain Adaptative Semantic Segmentation

Mu Chen; Zhedong Zheng; Yi Yang; Tat-Seng Chua

PiPa: ドメイン適応セマンティックセグメンテーションのためのピクセル単位およびパッチ単位の自己教師あり学習

教師なしドメイン適応 (UDA) は、学習したモデルの他のドメインへの一般化を強化することを目的としています。ドメイン不変の知識は、ラベル付けされたソースドメイン (ビデオゲームなど) でトレーニングされたモデルから、ラベル付けされていないターゲットドメイン (実世界のシナリオなど) に転送され、注釈費用を節約します。セマンティックセグメンテーションの既存の UDA メソッドは通常、ドメイン不変の知識を抽出するために、さまざまなレベル (ピクセル、特徴、予測など) のドメイン間の不一致を最小限に抑えることに重点を置いています。ただし、画像内のコンテキスト相関などの主要なドメイン内の知識は、まだ調査されていません。このギャップを埋めるために、PiPa と呼ばれる統合されたピクセル単位およびパッチ単位の自己教師あり学習フレームワークを提案します。これは、異なるコンテキストに対する画像内のピクセル単位の相関とパッチ単位の意味の一貫性を促進するドメイン適応型セマンティックセグメンテーション用です。 .提案されたフレームワークは、ドメイン内画像の固有の構造を利用します。これにより、(1) クラス内コンパクト性とクラス間分離性を備えた識別可能なピクセル単位の特徴の学習が明示的に促進され、(2) 同一の堅牢な特徴学習が動機付けられます。さまざまなコンテキストまたは変動に対してパッチを適用します。広範な実験により、提案された方法の有効性が検証され、広く使用されている 2 つの UDA ベンチマーク、つまり GTA で 75.6 mIoU と Cityscapes で 68.2 mIoU で競合する精度が得られます。さらに、私たちの方法は他のUDAアプローチと互換性があり、追加のパラメーターを導入することなくパフォーマンスをさらに向上させます。

Unsupervised Domain Adaptation (UDA) aims to enhance the generalization of the learned model to other domains. The domain-invariant knowledge is transferred from the model trained on labeled source domain, e.g., video game, to unlabeled target domains, e.g., real-world scenarios, saving annotation expenses. Existing UDA methods for semantic segmentation usually focus on minimizing the inter-domain discrepancy of various levels, e.g., pixels, features, and predictions, for extracting domain-invariant knowledge. However, the primary intra-domain knowledge, such as context correlation inside an image, remains underexplored. In an attempt to fill this gap, we propose a unified pixel- and patch-wise self-supervised learning framework, called PiPa, for domain adaptive semantic segmentation that facilitates intra-image pixel-wise correlations and patch-wise semantic consistency against different contexts. The proposed framework exploits the inherent structures of intra-domain images, which: (1) explicitly encourages learning the discriminative pixel-wise features with intra-class compactness and inter-class separability, and (2) motivates the robust feature learning of the identical patch against different contexts or fluctuations. Extensive experiments verify the effectiveness of the proposed method, which obtains competitive accuracy on the two widely-used UDA benchmarks, i.e., 75.6 mIoU on GTA to Cityscapes and 68.2 mIoU on Synthia to Cityscapes. Moreover, our method is compatible with other UDA approaches to further improve the performance without introducing extra parameters.

updated: Mon Nov 14 2022 18:31:24 GMT+0000 (UTC)

published: Mon Nov 14 2022 18:31:24 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト