Multi-Scale Feature Aggregation by Cross-Scale Pixel-to-Region Relation Operation for Semantic Segmentation

Yechao Bai; Ziyuan Huang; Lyuyu Shen; Hongliang Guo; Marcelo H. Ang Jr; Daniela Rus

セマンティックセグメンテーションのためのクロススケールピクセル間関係演算によるマルチスケール特徴集約

マルチスケール機能の活用は、セマンティックセグメンテーションの問題に取り組む上で大きな可能性を示しています。集約は、通常、合計または連結 (concat) の後に畳み込み (conv) 層を使用して行われます。ただし、相互関係を考慮せずに、高レベルのコンテキストを次の階層に完全に渡します。この作業では、低レベルの機能が、クロススケールのピクセルから領域へのリレーション操作によって、隣接する高レベルの機能マップから補完的なコンテキストを集約できるようにすることを目指しています。クロススケールのコンテキスト伝播を活用して、高解像度の低レベルの機能でも長距離の依存関係をキャプチャできるようにします。この目的のために、マルチスケールの特徴を得るために、効率的な特徴ピラミッドネットワークを採用しています。コンテキストの抽出と伝播のために、それぞれリレーショナルセマンティクスエクストラクター (RSE) とリレーショナルセマンティクスプロパゲーター (RSP) を提案します。次に、いくつかの RSP を RSP ヘッドにスタックして、コンテキストのプログレッシブトップダウン分散を実現します。 2 つの困難なデータセット Cityscapes と COCO での実験結果は、RSP ヘッドがセマンティックセグメンテーションとパノプティックセグメンテーションの両方で高い効率で競合することを示しています。これは、DeeplabV3 [1] よりも 0.7% パフォーマンスが高く、セマンティックセグメンテーションタスクの FLOP (乗算加算) が 75% 少ないです。

Exploiting multi-scale features has shown great potential in tackling semantic segmentation problems. The aggregation is commonly done with sum or concatenation (concat) followed by convolutional (conv) layers. However, it fully passes down the high-level context to the following hierarchy without considering their interrelation. In this work, we aim to enable the low-level feature to aggregate the complementary context from adjacent high-level feature maps by a cross-scale pixel-to-region relation operation. We leverage cross-scale context propagation to make the long-range dependency capturable even by the high-resolution low-level features. To this end, we employ an efficient feature pyramid network to obtain multi-scale features. We propose a Relational Semantics Extractor (RSE) and Relational Semantics Propagator (RSP) for context extraction and propagation respectively. Then we stack several RSP into an RSP head to achieve the progressive top-down distribution of the context. Experiment results on two challenging datasets Cityscapes and COCO demonstrate that the RSP head performs competitively on both semantic segmentation and panoptic segmentation with high efficiency. It outperforms DeeplabV3 [1] by 0.7% with 75% fewer FLOPs (multiply-adds) in the semantic segmentation task.

updated: Thu Jun 03 2021 10:49:48 GMT+0000 (UTC)

published: Thu Jun 03 2021 10:49:48 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト