Differentiating Features for Scene Segmentation Based on Dedicated Attention Mechanisms

Zhiqiang Xiong; Zhicheng Wang; Zhaohui Yu; Xi Gu

専用の注意メカニズムに基づいたシーンセグメンテーションの特徴の区別

セマンティックセグメンテーションは、シーン解析の課題です。コンテキスト情報と豊富な空間情報の両方が必要です。この論文では、専用アテンションメカニズム（DF-DAM）に基づいてシーンセグメンテーションの機能を区別し、エンコーダーの高レベルおよび低レベルの機能をそれぞれ最適化する2つのアテンションモジュールを提案します。具体的には、ResNetの高レベルおよび低レベルの機能をそれぞれコンテキスト情報と空間情報のソースとして使用し、それぞれアテンションフュージョンモジュールと2D位置アテンションモジュールで最適化します。アテンションフュージョンモジュールでは、デュアルチャネルウェイトを採用して、ResNetの最高2ステージ機能のチャネルマップを選択的に調整し、それらを融合してコンテキスト情報を取得します。 2D位置アテンションモジュールの場合、アテンションフュージョンモジュールによって取得されたコンテキスト情報を使用して、補助的な空間情報としてResNetの最下位段階の特徴の選択を支援します。最後に、2つのモジュールで取得した2つの情報セットを単純に融合して、予測を取得します。 CityscapesおよびPASCAL VOC 2012データセットに対するアプローチを評価します。特に、このアーキテクチャには複雑で冗長な処理モジュールがないため、複雑さが大幅に軽減され、MS-COCOデータセットの事前トレーニングなしでPASCAL VOC 2012テストデータセットで82.3％の平均IoUを達成しています。

Semantic segmentation is a challenge in scene parsing. It requires both context information and rich spatial information. In this paper, we differentiate features for scene segmentation based on dedicated attention mechanisms (DF-DAM), and two attention modules are proposed to optimize the high-level and low-level features in the encoder, respectively. Specifically, we use the high-level and low-level features of ResNet as the source of context information and spatial information, respectively, and optimize them with attention fusion module and 2D position attention module, respectively. For attention fusion module, we adopt dual channel weight to selectively adjust the channel map for the highest two stage features of ResNet, and fuse them to get context information. For 2D position attention module, we use the context information obtained by attention fusion module to assist the selection of the lowest-stage features of ResNet as supplementary spatial information. Finally, the two sets of information obtained by the two modules are simply fused to obtain the prediction. We evaluate our approach on Cityscapes and PASCAL VOC 2012 datasets. In particular, there aren't complicated and redundant processing modules in our architecture, which greatly reduces the complexity, and we achieving 82.3% Mean IoU on PASCAL VOC 2012 test dataset without pre-training on MS-COCO dataset.

updated: Tue Nov 19 2019 08:17:59 GMT+0000 (UTC)

published: Tue Nov 19 2019 08:17:59 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト