Semi-supervised Domain Adaptation for Semantic Segmentation

Ying Chen; Xu Ouyang; Kaiyue Zhu; Gady Agam

セマンティックセグメンテーションのための半教師ありドメイン適応

セマンティックセグメンテーションの深層学習アプローチは、主に教師あり学習アプローチに依存しており、ピクセルレベルの注釈を作成するためにかなりの努力が必要です。さらに、そのようなアプローチは、見えない画像ドメインに適用された場合、パフォーマンスが低下する可能性があります。これらの制限に対処するために、完全なソース監視を備えているがターゲット監視を備えていない教師なしドメイン適応（UDA）と、部分的監視を備えた半教師あり学習（SSL）の両方が提案されています。このような方法は、さまざまな機能の分布を調整するのに効果的ですが、完全に監視された方法に関するパフォーマンスのギャップに対処するために、ラベルのないデータを効率的に活用する必要があります。このホワイトペーパーでは、セマンティックセグメンテーションの半教師ありドメイン適応（SSDA）について説明します。この場合、大量のラベル付きソースデータと少量のラベル付きターゲットデータが利用可能です。セマンティックセグメンテーションのクロスドメインギャップとドメイン内ギャップの両方に対処するために、新規で効果的な2段階の半教師ありデュアルドメイン適応（SSDDA）アプローチを提案します。提案されたフレームワークは、2つのミキシングモジュールで構成されています。まず、画像レベルのミキシング戦略を介してクロスドメイン適応を実行します。これは、ソースデータとターゲットデータの間の特徴の分布シフトを調整することを学習します。第2に、ドメイン内の適応は、予測されたオブジェクトの境界を尊重する方法でラベルのないターゲットデータを混合することにより、カテゴリレベルのデータ拡張を生成するように構築された別個の学生と教師のネットワークを使用して実現されます。提案されたアプローチが、2つの一般的な合成から実際へのセマンティックセグメンテーションベンチマークで最先端の方法よりも優れていることを示します。私たちのアプローチの有効性をさらに検証するために、広範なアブレーション研究が提供されています。

Deep learning approaches for semantic segmentation rely primarily on supervised learning approaches and require substantial efforts in producing pixel-level annotations. Further, such approaches may perform poorly when applied to unseen image domains. To cope with these limitations, both unsupervised domain adaptation (UDA) with full source supervision but without target supervision and semi-supervised learning (SSL) with partial supervision have been proposed. While such methods are effective at aligning different feature distributions, there is still a need to efficiently exploit unlabeled data to address the performance gap with respect to fully-supervised methods. In this paper we address semi-supervised domain adaptation (SSDA) for semantic segmentation, where a large amount of labeled source data as well as a small amount of labeled target data are available. We propose a novel and effective two-step semi-supervised dual-domain adaptation (SSDDA) approach to address both cross- and intra-domain gaps in semantic segmentation. The proposed framework is comprised of two mixing modules. First, we conduct a cross-domain adaptation via an image-level mixing strategy, which learns to align the distribution shift of features between the source data and target data. Second, intra-domain adaptation is achieved using a separate student-teacher network which is built to generate category-level data augmentation by mixing unlabeled target data in a way that respects predicted object boundaries. We demonstrate that the proposed approach outperforms state-of-the-art methods on two common synthetic-to-real semantic segmentation benchmarks. An extensive ablation study is provided to further validate the effectiveness of our approach.

updated: Wed Oct 20 2021 16:13:00 GMT+0000 (UTC)

published: Wed Oct 20 2021 16:13:00 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト