MSCFNet: A Lightweight Network With Multi-Scale Context Fusion for Real-Time Semantic Segmentation

Guangwei Gao; Guoan Xu; Yi Yu; Jin Xie; Jian Yang; Dong Yue

MSCFNet：リアルタイムセマンティックセグメンテーションのためのマルチスケールコンテキストフュージョンを備えた軽量ネットワーク

近年、自動運転システムやドローンなどの現実世界のシナリオで重要な役割を果たすリアルタイムセマンティックセグメンテーションアプリケーションでは、精度と推論速度の間で適切なトレードオフをどのように行うかが中心的な問題になっています。この研究では、この問題を解決するために非対称エンコーダ-デコーダアーキテクチャを調査するマルチスケールコンテキストフュージョン（MSCFNet）スキームを使用して新しい軽量ネットワークを考案します。より具体的には、エンコーダーは、因数分解の深さ方向の畳み込みと拡張畳み込みで構成される、いくつかの開発された効率的な非対称残差（EAR）モジュールを採用しています。一方、複雑な計算の代わりに、単純なデコンボリューションがデコーダーに適用され、高いセグメンテーション精度を維持しながらパラメーターの量をさらに削減します。また、MSCFNetには、ネットワークのさまざまな段階からの効率的なアテンションモジュールを備えたブランチがあり、マルチスケールのコンテキスト情報を適切にキャプチャします。次に、最終的な分類の前にそれらを組み合わせて、特徴の表現を強化し、セグメンテーション効率を向上させます。やりがいのあるデータセットに関する包括的な実験により、115万個のパラメーターのみを含む提案されたMSCFNetは、Cityscapesテストデータセットで71.9％の平均IoUを達成し、単一のTitan XPGPU構成で50FPS以上で実行できることが実証されました。

In recent years, how to strike a good trade-off between accuracy and inference speed has become the core issue for real-time semantic segmentation applications, which plays a vital role in real-world scenarios such as autonomous driving systems and drones. In this study, we devise a novel lightweight network using a multi-scale context fusion (MSCFNet) scheme, which explores an asymmetric encoder-decoder architecture to dispose this problem. More specifically, the encoder adopts some developed efficient asymmetric residual (EAR) modules, which are composed of factorization depth-wise convolution and dilation convolution. Meanwhile, instead of complicated computation, simple deconvolution is applied in the decoder to further reduce the amount of parameters while still maintaining high segmentation accuracy. Also, MSCFNet has branches with efficient attention modules from different stages of the network to well capture multi-scale contextual information. Then we combine them before the final classification to enhance the expression of the features and improve the segmentation efficiency. Comprehensive experiments on challenging datasets have demonstrated that the proposed MSCFNet, which contains only 1.15M parameters, achieves 71.9% Mean IoU on the Cityscapes testing dataset and can run at over 50 FPS on a single Titan XP GPU configuration.

updated: Fri Jul 16 2021 09:10:51 GMT+0000 (UTC)

published: Wed Mar 24 2021 08:28:26 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト