Dilated Convolutions with Lateral Inhibitions for Semantic Image Segmentation

Yujiang Wang; Mingzhi Dong; Jie Shen; Yiming Lin; Maja Pantic

セマンティック画像セグメンテーションのための横方向の抑制を伴う拡張畳み込み

拡張畳み込みは、重みを追加したり、空間分解能を犠牲にしたりすることなく、フィルターの受容野を拡大できるため、ディープセマンティックセグメンテーションモデルで広く使用されています。ただし、拡張畳み込みフィルターは、意味的に意味のある輪郭上のピクセルに関する位置知識を持たないため、オブジェクト境界の予測があいまいになる可能性があります。さらに、フィルターを拡張すると受容野を拡大できますが、サンプリングされたピクセルの総数は変化しません。これは通常、受容野の総面積のごく一部を構成します。人間の視覚系の横方向抑制（LI）メカニズムに触発されて、これらの制限を克服するために横方向抑制（LI-Convs）を伴う拡張畳み込みを提案します。 LIメカニズムを導入すると、セマンティックオブジェクトの境界に対する畳み込みフィルターの感度が向上します。さらに、LI-Convsは、横方向に抑制されたゾーンからのピクセルも暗黙的に考慮に入れるため、より高密度のスケールで特徴を抽出することもできます。 LI-ConvsをDeeplabv3 +アーキテクチャに統合することにより、Lateral Inhibited Atrous Spatial Pyramid Pooling（LI-ASPP）、Lateral Inhibited MobileNet-V2（LI-MNV2）、Lateral Inhibited ResNet（LI-ResNet）を提案します。 3つのベンチマークデータセット（PASCAL VOC 2012、CelebAMask-HQ、およびADE20K）の実験結果は、LIベースのセグメンテーションモデルがすべてのベースラインを上回っていることを示しており、提案されたLI-Convの有効性と一般性を検証しています。

Dilated convolutions are widely used in deep semantic segmentation models as they can enlarge the filters' receptive field without adding additional weights nor sacrificing spatial resolution. However, as dilated convolutional filters do not possess positional knowledge about the pixels on semantically meaningful contours, they could lead to ambiguous predictions on object boundaries. In addition, although dilating the filter can expand its receptive field, the total number of sampled pixels remains unchanged, which usually comprises a small fraction of the receptive field's total area. Inspired by the Lateral Inhibition (LI) mechanisms in human visual systems, we propose the dilated convolution with lateral inhibitions (LI-Convs) to overcome these limitations. Introducing LI mechanisms improves the convolutional filter's sensitivity to semantic object boundaries. Moreover, since LI-Convs also implicitly take the pixels from the laterally inhibited zones into consideration, they can also extract features at a denser scale. By integrating LI-Convs into the Deeplabv3+ architecture, we propose the Lateral Inhibited Atrous Spatial Pyramid Pooling (LI-ASPP), the Lateral Inhibited MobileNet-V2 (LI-MNV2) and the Lateral Inhibited ResNet (LI-ResNet). Experimental results on three benchmark datasets (PASCAL VOC 2012, CelebAMask-HQ and ADE20K) show that our LI-based segmentation models outperform the baseline on all of them, thus verify the effectiveness and generality of the proposed LI-Convs.

updated: Wed Jan 19 2022 13:12:50 GMT+0000 (UTC)

published: Fri Jun 05 2020 21:55:43 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト