Improving Semi-Supervised and Domain-Adaptive Semantic Segmentation with Self-Supervised Depth Estimation

Lukas Hoyer; Dengxin Dai; Qin Wang; Yuhua Chen; Luc Van Gool

自己教師あり深さ推定による半教師ありおよびドメイン適応セマンティックセグメンテーションの改善

セマンティックセグメンテーションのためにディープネットワークをトレーニングするには、大量のラベル付きトレーニングデータが必要です。これは、セグメンテーションマスクのラベル付けが非常に労働集約的なプロセスであるため、実際には大きな課題となります。この問題に対処するために、半教師ありおよびドメイン適応セマンティックセグメンテーションのフレームワークを提示します。これは、ラベルのない画像シーケンスでのみトレーニングされた自己教師あり単眼深度推定（SDE）によって強化されます。特に、SDEを学習フレームワーク全体で包括的に補助タスクとして利用します。まず、サンプルの多様性とSDEとセマンティックセグメンテーションの難易度の相関に基づいて、セマンティックセグメンテーションに注釈を付ける最も有用なサンプルを自動的に選択します。次に、シーンのジオメトリを使用して画像とラベルを混合することにより、強力なデータ拡張を実装します。第三に、SDE中に学習した機能から、転送とマルチタスク学習によってセマンティックセグメンテーションに知識を転送します。そして第4に、Cross-DomainDepthMixとMatchingGeometry Samplingを使用して、追加のラベル付き合成データを活用し、合成データと実際のデータを整列させます。 Cityscapesデータセットで提案されたモデルを検証します。ここでは、4つの貢献すべてが大幅なパフォーマンスの向上を示し、半教師ありセマンティックセグメンテーションと半教師ありドメイン適応の最先端の結果を達成します。特に、Cityscapesラベルの1/30のみで、私たちの方法は完全に監視されたベースラインパフォーマンスの92％を達成し、GTAからの追加データを活用する場合でも97％を達成します。ソースコードはhttps://github.com/lhoyer/improving_segmentation_with_selfsupervised_depthで入手できます。

Training deep networks for semantic segmentation requires large amounts of labeled training data, which presents a major challenge in practice, as labeling segmentation masks is a highly labor-intensive process. To address this issue, we present a framework for semi-supervised and domain-adaptive semantic segmentation, which is enhanced by self-supervised monocular depth estimation (SDE) trained only on unlabeled image sequences. In particular, we utilize SDE as an auxiliary task comprehensively across the entire learning framework: First, we automatically select the most useful samples to be annotated for semantic segmentation based on the correlation of sample diversity and difficulty between SDE and semantic segmentation. Second, we implement a strong data augmentation by mixing images and labels using the geometry of the scene. Third, we transfer knowledge from features learned during SDE to semantic segmentation by means of transfer and multi-task learning. And fourth, we exploit additional labeled synthetic data with Cross-Domain DepthMix and Matching Geometry Sampling to align synthetic and real data. We validate the proposed model on the Cityscapes dataset, where all four contributions demonstrate significant performance gains, and achieve state-of-the-art results for semi-supervised semantic segmentation as well as for semi-supervised domain adaptation. In particular, with only 1/30 of the Cityscapes labels, our method achieves 92% of the fully-supervised baseline performance and even 97% when exploiting additional data from GTA. The source code is available at https://github.com/lhoyer/improving_segmentation_with_selfsupervised_depth.

updated: Sat Aug 28 2021 01:33:38 GMT+0000 (UTC)

published: Sat Aug 28 2021 01:33:38 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト