Attention guided global enhancement and local refinement network for semantic segmentation

Jiangyun Li; Sen Zha; Chen Chen; Meng Ding; Tianxiang Zhang; Hong Yu

セマンティックセグメンテーションのための注意誘導グローバルエンハンスメントとローカルリファインメントネットワーク

エンコーダ-デコーダアーキテクチャは、軽量のセマンティックセグメンテーションネットワークとして広く使用されています。ただし、2つの主要な問題について、適切に設計されたDilated-FCNモデルと比較して、パフォーマンスが制限されています。まず、補間やデコンボリューションなど、デコーダーで一般的に使用されるアップサンプリング方法は、ローカルの受容野に悩まされ、グローバルコンテキストをエンコードできません。第2に、低レベルの機能は、初期のエンコーダ層のセマンティックコンセプトが不十分なため、スキップ接続を介してネットワークデコーダにノイズをもたらす可能性があります。これらの課題に取り組むために、グローバルエンハンスメントメソッドが提案され、高レベルの機能マップからグローバル情報を集約し、それらをさまざまなデコーダーレイヤーに適応的に配布して、アップサンプリングプロセスでのグローバルコンテキストの不足を軽減します。さらに、ローカルリファインメントモジュールは、これら2つ（デコーダー機能とエンコーダー機能）が融合する前に、ノイズの多いエンコーダー機能をリファインするためのセマンティックガイダンスとしてデコーダー機能を利用することによって開発されます。次に、2つの方法がContext Fusion Blockに統合され、それに基づいて、新しいアテンションガイド付きグローバル拡張およびローカルリファインメントネットワーク（AGLN）が精巧に設計されます。 PASCALコンテキスト、ADE20K、およびPASCAL VOC 2012データセットに関する広範な実験により、提案されたアプローチの有効性が実証されました。特に、バニラのResNet-101バックボーンを使用すると、AGLNはPASCALコンテキストデータセットで最先端の結果（56.23％平均IoU）を達成します。コードはhttps://github.com/zhasen1996/AGLNで入手できます。

The encoder-decoder architecture is widely used as a lightweight semantic segmentation network. However, it struggles with a limited performance compared to a well-designed Dilated-FCN model for two major problems. First, commonly used upsampling methods in the decoder such as interpolation and deconvolution suffer from a local receptive field, unable to encode global contexts. Second, low-level features may bring noises to the network decoder through skip connections for the inadequacy of semantic concepts in early encoder layers. To tackle these challenges, a Global Enhancement Method is proposed to aggregate global information from high-level feature maps and adaptively distribute them to different decoder layers, alleviating the shortage of global contexts in the upsampling process. Besides, a Local Refinement Module is developed by utilizing the decoder features as the semantic guidance to refine the noisy encoder features before the fusion of these two (the decoder features and the encoder features). Then, the two methods are integrated into a Context Fusion Block, and based on that, a novel Attention guided Global enhancement and Local refinement Network (AGLN) is elaborately designed. Extensive experiments on PASCAL Context, ADE20K, and PASCAL VOC 2012 datasets have demonstrated the effectiveness of the proposed approach. In particular, with a vanilla ResNet-101 backbone, AGLN achieves the state-of-the-art result (56.23% mean IoU) on the PASCAL Context dataset. The code is available at https://github.com/zhasen1996/AGLN.

updated: Sat Apr 09 2022 02:32:24 GMT+0000 (UTC)

published: Sat Apr 09 2022 02:32:24 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト