S2-FPN: Scale-ware Strip Attention Guided Feature Pyramid Network for Real-time Semantic Segmentation

Mohammed A. M. Elhassan; Chenhui Yang; Chenxi Huang; Tewodros Legesse Munea; Xin Hong

S2-FPN：リアルタイムセマンティックセグメンテーションのためのスケールウェアストリップアテンションガイド機能ピラミッドネットワーク

最新の高性能セマンティックセグメンテーション手法では、重いバックボーンと拡張された畳み込みを使用して、関連する特徴を抽出します。セグメンテーションタスクでは、コンテキスト情報とセマンティック情報の両方を使用して特徴を抽出することが重要ですが、リアルタイムアプリケーションではメモリフットプリントと高い計算コストが発生します。この論文は、リアルタイムの道路シーンのセマンティックセグメンテーションのための精度/速度の間のトレードオフを達成するための新しいモデルを提示します。具体的には、Scale-aware Strip Attention Guided Feature Pyramid Network（S2-FPN）という名前の軽量モデルを提案しました。私たちのネットワークは、Attention Pyramid Fusion（APF）モジュール、Scale-aware Strip Attention Module（SSAM）、およびGlobal Feature Upsample（GFU）モジュールの3つの主要モジュールで構成されています。 APFは、識別メカニズムを採用して、識別可能なマルチスケール機能を学習し、異なるレベル間のセマンティックギャップを埋めるのに役立ちます。 APFは、スケールを意識した注意を使用して、垂直ストリッピング操作でグローバルコンテキストをエンコードし、長距離の依存関係をモデル化します。これにより、ピクセルを同様のセマンティックラベルに関連付けることができます。さらに、APFは、チャネル機能を強調するためにチャネルごとの再重み付けブロック（CRB）を採用しています。最後に、S2-FPNのデコーダーは、APFとエンコーダーの機能を融合するために使用されるGFUを採用します。 2つの挑戦的なセマンティックセグメンテーションベンチマークで広範な実験が行われ、私たちのアプローチが異なるモデル設定でより良い精度/速度のトレードオフを達成することを示しています。提案されたモデルは、Cityscapesデータセットで76.2％mIoU / 87.3FPS、77.4％mIoU / 67FPS、および77.8％mIoU / 30.5FPS、Camvidデータセットで69.6％mIoU、71.0％mIoU、および74.2％mIoUの結果を達成しました。この作業のコードは、\url{https://github.com/mohamedac29/S2-FPNで入手できます。

Modern high-performance semantic segmentation methods employ a heavy backbone and dilated convolution to extract the relevant feature. Although extracting features with both contextual and semantic information is critical for the segmentation tasks, it brings a memory footprint and high computation cost for real-time applications. This paper presents a new model to achieve a trade-off between accuracy/speed for real-time road scene semantic segmentation. Specifically, we proposed a lightweight model named Scale-aware Strip Attention Guided Feature Pyramid Network (S2-FPN). Our network consists of three main modules: Attention Pyramid Fusion (APF) module, Scale-aware Strip Attention Module (SSAM), and Global Feature Upsample (GFU) module. APF adopts an attention mechanisms to learn discriminative multi-scale features and help close the semantic gap between different levels. APF uses the scale-aware attention to encode global context with vertical stripping operation and models the long-range dependencies, which helps relate pixels with similar semantic label. In addition, APF employs channel-wise reweighting block (CRB) to emphasize the channel features. Finally, the decoder of S2-FPN then adopts GFU, which is used to fuse features from APF and the encoder. Extensive experiments have been conducted on two challenging semantic segmentation benchmarks, which demonstrate that our approach achieves better accuracy/speed trade-off with different model settings. The proposed models have achieved a results of 76.2%mIoU/87.3FPS, 77.4%mIoU/67FPS, and 77.8%mIoU/30.5FPS on Cityscapes dataset, and 69.6%mIoU,71.0% mIoU, and 74.2% mIoU on Camvid dataset. The code for this work will be made available at \url{https://github.com/mohamedac29/S2-FPN

updated: Wed Jun 15 2022 05:02:49 GMT+0000 (UTC)

published: Wed Jun 15 2022 05:02:49 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト