Domain Adaptive and Generalizable Network Architectures and Training Strategies for Semantic Image Segmentation

Lukas Hoyer; Dengxin Dai; Luc Van Gool

セマンティックイメージセグメンテーションのためのドメイン適応型および一般化可能なネットワークアーキテクチャとトレーニング戦略

教師なしドメイン適応 (UDA) とドメイン一般化 (DG) により、ソースドメインでトレーニングされた機械学習モデルは、ラベル付けされていない、または目に見えないターゲットドメインでも適切に機能します。以前の UDA&DG のセマンティックセグメンテーション方法はほとんどが古いネットワークに基づいているため、最新のアーキテクチャをベンチマークし、Transformer の可能性を明らかにし、UDA&DG に合わせた DAFormer ネットワークを設計します。これは、ソースドメインへのオーバーフィッティングを回避するための 3 つのトレーニング戦略によって有効になります。(1) レアクラスサンプリングは共通ソースドメインクラスへの偏りを軽減し、(2) Thing-Class ImageNet 特徴距離、および (3) 学習率ウォームアップは促進します。 ImageNet 事前トレーニングからの機能転送。 UDA と DG は通常、GPU メモリを集中的に使用するため、以前のほとんどの方法では画像を縮小またはトリミングしていました。ただし、低解像度の予測では細部の保存に失敗することが多く、トリミングされた画像でトレーニングされたモデルは、ドメインに堅牢な長距離のコンテキスト情報を取得するのに不十分です。したがって、UDA&DG のマルチ解像度フレームワークである HRDA を提案します。これは、細かいセグメンテーションの詳細を保持するための小さな高解像度クロップと、学習されたスケールの注意で長距離のコンテキスト依存関係をキャプチャするための大きな低解像度クロップの長所を組み合わせたものです。 DAFormer と HRDA は、最先端の UDA&DG を 5 つの異なるベンチマークで 10 mIoU 以上大幅に改善します。実装は https://github.com/lhoyer/HRDA で入手できます。

Unsupervised domain adaptation (UDA) and domain generalization (DG) enable machine learning models trained on a source domain to perform well on unlabeled or even unseen target domains. As previous UDA&DG semantic segmentation methods are mostly based on outdated networks, we benchmark more recent architectures, reveal the potential of Transformers, and design the DAFormer network tailored for UDA&DG. It is enabled by three training strategies to avoid overfitting to the source domain: While (1) Rare Class Sampling mitigates the bias toward common source domain classes, (2) a Thing-Class ImageNet Feature Distance and (3) a learning rate warmup promote feature transfer from ImageNet pretraining. As UDA&DG are usually GPU memory intensive, most previous methods downscale or crop images. However, low-resolution predictions often fail to preserve fine details while models trained with cropped images fall short in capturing long-range, domain-robust context information. Therefore, we propose HRDA, a multi-resolution framework for UDA&DG, that combines the strengths of small high-resolution crops to preserve fine segmentation details and large low-resolution crops to capture long-range context dependencies with a learned scale attention. DAFormer and HRDA significantly improve the state-of-the-art UDA&DG by more than 10 mIoU on 5 different benchmarks. The implementation is available at https://github.com/lhoyer/HRDA.

updated: Wed Apr 26 2023 15:18:45 GMT+0000 (UTC)

published: Wed Apr 26 2023 15:18:45 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト