HGFormer: Hierarchical Grouping Transformer for Domain Generalized Semantic Segmentation

Jian Ding; Nan Xue; Gui-Song Xia; Bernt Schiele; Dengxin Dai

HGFormer: ドメイン一般化セマンティックセグメンテーション用の階層グループ化トランスフォーマー

現在のセマンティックセグメンテーションモデルは、独立かつ同一分散 (iid) 条件下で大きな成功を収めています。ただし、実際のアプリケーションでは、テストデータがトレーニングデータとは異なるドメインから取得される場合があります。したがって、ドメインの違いに対するモデルの堅牢性を向上させることが重要です。この研究では、ドメイン汎化設定の下でセマンティックセグメンテーションを研究します。モデルはソースドメインでのみトレーニングされ、目に見えないターゲットドメインでテストされます。既存の研究は、ビジョントランスフォーマーが CNN よりも堅牢であることを示し、これが自己注意の視覚的なグループ化特性に関連していることを示しています。この研究では、ピクセルを明示的にグループ化して部分レベルのマスクを形成し、次に全体レベルのマスクを形成する新しい階層グループ化トランスフォーマ (HGFormer) を提案します。さまざまなスケールのマスクは、クラスの部分と全体の両方をセグメント化することを目的としています。 HGFormer は、両方のスケールでのマスク分類結果を組み合わせてクラスラベルを予測します。 7 つのパブリックセマンティックセグメンテーションデータセットを使用して、複数の興味深いクロスドメイン設定を組み立てます。実験の結果、HGFormer はピクセルごとの分類方法やフラットグループ化トランスフォーマーよりも堅牢なセマンティックセグメンテーションの結果を生成し、以前の方法を大幅に上回るパフォーマンスを示しました。コードは https://github.com/dingjiansw101/HGFormer で入手できます。

Current semantic segmentation models have achieved great success under the independent and identically distributed (i.i.d.) condition. However, in real-world applications, test data might come from a different domain than training data. Therefore, it is important to improve model robustness against domain differences. This work studies semantic segmentation under the domain generalization setting, where a model is trained only on the source domain and tested on the unseen target domain. Existing works show that Vision Transformers are more robust than CNNs and show that this is related to the visual grouping property of self-attention. In this work, we propose a novel hierarchical grouping transformer (HGFormer) to explicitly group pixels to form part-level masks and then whole-level masks. The masks at different scales aim to segment out both parts and a whole of classes. HGFormer combines mask classification results at both scales for class label prediction. We assemble multiple interesting cross-domain settings by using seven public semantic segmentation datasets. Experiments show that HGFormer yields more robust semantic segmentation results than per-pixel classification methods and flat grouping transformers, and outperforms previous methods significantly. Code will be available at https://github.com/dingjiansw101/HGFormer.

updated: Mon May 22 2023 13:33:41 GMT+0000 (UTC)

published: Mon May 22 2023 13:33:41 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト