Cross-Dataset Collaborative Learning for Semantic Segmentation

Li Wang; Dong Li; Yousong Zhu; Lu Tian; Yi Shan

セマンティックセグメンテーションのためのクロスデータセット協調学習

最近の研究では、ターゲットデータセットで適切に設計されたアーキテクチャを調査することにより、セマンティックセグメンテーションのパフォーマンスを向上させようとしています。ただし、さまざまなデータセット間での固有の分布シフトのため、さまざまなデータセットから同時に学習する統合システムを構築することは依然として困難です。このホワイトペーパーでは、Cross-Dataset Collaborative Learning（CDCL）と呼ばれる、セマンティックセグメンテーションのシンプルで柔軟な一般的な方法を紹介します。複数のラベル付きデータセットが与えられた場合、各データセットの特徴表現の一般化と識別を改善することを目指しています。具体的には、最初に、ネットワークの基本的なコンピューティングユニットとしてデータセット対応ブロック（DAB）のファミリーを紹介します。これは、さまざまなデータセットにわたる同種の表現と異種の統計をキャプチャするのに役立ちます。次に、最適化手順を効率的に促進するためのデータセット交替トレーニング（DAT）メカニズムを提案します。単一データセットとクロスデータセットの設定で、Cityscapes、BDD100K、CamVid、COCOStuffの4つの多様なデータセットに対して広範な評価を実施します。実験結果は、追加のFLOPを導入することなく、以前の単一データセットおよびクロスデータセットのトレーニング方法に比べて、私たちの方法が一貫して顕著な改善を達成していることを示しています。特に、PSPNet（ResNet-18）の同じアーキテクチャでは、Cityscapes、BDD100K、CamVidの検証セットでmIoUの5.65％、6.57％、5.79％だけ、単一データセットのベースラインを上回っています。コードとモデルがリリースされます。

Recent work attempts to improve semantic segmentation performance by exploring well-designed architectures on a target dataset. However, it remains challenging to build a unified system that simultaneously learns from various datasets due to the inherent distribution shift across different datasets. In this paper, we present a simple, flexible, and general method for semantic segmentation, termed Cross-Dataset Collaborative Learning (CDCL). Given multiple labeled datasets, we aim to improve the generalization and discrimination of feature representations on each dataset. Specifically, we first introduce a family of Dataset-Aware Blocks (DAB) as the fundamental computing units of the network, which help capture homogeneous representations and heterogeneous statistics across different datasets. Second, we propose a Dataset Alternation Training (DAT) mechanism to efficiently facilitate the optimization procedure. We conduct extensive evaluations on four diverse datasets, i.e., Cityscapes, BDD100K, CamVid, and COCO Stuff, with single-dataset and cross-dataset settings. Experimental results demonstrate our method consistently achieves notable improvements over prior single-dataset and cross-dataset training methods without introducing extra FLOPs. Particularly, with the same architecture of PSPNet (ResNet-18), our method outperforms the single-dataset baseline by 5.65%, 6.57%, and 5.79% of mIoU on the validation sets of Cityscapes, BDD100K, CamVid, respectively. Code and models will be released.

updated: Sun Mar 21 2021 09:59:47 GMT+0000 (UTC)

published: Sun Mar 21 2021 09:59:47 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト