A Unified Efficient Pyramid Transformer for Semantic Segmentation

Fangrui Zhu; Yi Zhu; Li Zhang; Chongruo Wu; Yanwei Fu; Mu Li

セマンティックセグメンテーションのための統一された効率的なピラミッドトランス

セマンティックセグメンテーションは、複雑なシーンでのコンテキストのモデリングが困難であり、境界に沿ってクラスが混乱するため、困難な問題です。ほとんどの文献は、コンテキストモデリングまたは境界の改良に焦点を当てていますが、これはオープンワールドのシナリオでは一般化できません。この作業では、コンテキスト情報と境界アーティファクトの両方を考慮してオブジェクトをセグメント化するための統合フレームワーク（UN-EPT）を提唱します。まず、スパースサンプリング戦略を採用して、効率的なコンテキストモデリングのためのトランスフォーマーベースのアテンションメカニズムを組み込みます。さらに、境界を絞り込むために画像の詳細をキャプチャするために、別の空間ブランチが導入されています。モデル全体をエンドツーエンドでトレーニングできます。メモリーフットプリントが少ないセマンティックセグメンテーションの3つの一般的なベンチマークで有望なパフォーマンスを示します。コードはまもなくリリースされます。

Semantic segmentation is a challenging problem due to difficulties in modeling context in complex scenes and class confusions along boundaries. Most literature either focuses on context modeling or boundary refinement, which is less generalizable in open-world scenarios. In this work, we advocate a unified framework(UN-EPT) to segment objects by considering both context information and boundary artifacts. We first adapt a sparse sampling strategy to incorporate the transformer-based attention mechanism for efficient context modeling. In addition, a separate spatial branch is introduced to capture image details for boundary refinement. The whole model can be trained in an end-to-end manner. We demonstrate promising performance on three popular benchmarks for semantic segmentation with low memory footprint. Code will be released soon.

updated: Thu Jul 29 2021 17:47:32 GMT+0000 (UTC)

published: Thu Jul 29 2021 17:47:32 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト