FCN+: Global Receptive Convolution Makes FCN Great Again

Zhongying Deng; Xiaoyu Ren; Jin Ye; Junjun He; Yu Qiao

FCN+: Global Receptive Convolution が FCN を再び素晴らしいものに

完全畳み込みネットワーク (FCN) は、セマンティックセグメンテーションの重要な研究です。ただし、受容野が限られているため、FCN はセマンティックセグメンテーションに不可欠なグローバルコンテキスト情報を効果的にキャプチャできません。その結果、より大きな受容野のために異なるフィルターサイズを活用する最先端の方法によって打ち負かされます.ただし、このような戦略は通常、より多くのパラメーターを導入し、計算コストを増加させます。この論文では、コンテキスト情報抽出のためにFCNの受容野を効果的に増加させる新しいグローバル受容畳み込み（GRC）を提案します。これにより、FCN +と呼ばれる改善されたFCNが得られます。 GRC は、追加の学習可能なパラメーターを導入することなく、畳み込みのグローバルな受容フィールドを提供します。 GRC の動機は、畳み込みフィルターの異なるチャネルが入力特徴マップ全体にわたって異なるグリッドサンプリング位置を持つことができることです。具体的には、GRC は最初にフィルターのチャネルを 2 つのグループに分割します。最初のグループのグリッドサンプリング位置は、チャネルインデックスに従って、フィーチャマップ全体で異なる空間座標にシフトされます。これは、畳み込みフィルターがグローバルコンテキスト情報を取得するのに役立ちます。元の位置情報を保持するために、2 番目のグループのグリッドサンプリング位置は変更されません。これら 2 つのグループを使用して畳み込みを行うことで、GRC はグローバルコンテキストを各ピクセルの元の位置情報に統合して、より高密度の予測結果を得ることができます。 GRC が組み込まれているため、PASCAL VOC 2012、Cityscapes、および ADE20K で検証されているように、FCN+ はセマンティックセグメンテーションタスクの最先端の方法に匹敵するパフォーマンスを達成できます。

Fully convolutional network (FCN) is a seminal work for semantic segmentation. However, due to its limited receptive field, FCN cannot effectively capture global context information which is vital for semantic segmentation. As a result, it is beaten by state-of-the-art methods which leverage different filter sizes for larger receptive fields. However, such a strategy usually introduces more parameters and increases the computational cost. In this paper, we propose a novel global receptive convolution (GRC) to effectively increase the receptive field of FCN for context information extraction, which results in an improved FCN termed FCN+. The GRC provides global receptive field for convolution without introducing any extra learnable parameters. The motivation of GRC is that different channels of a convolutional filter can have different grid sampling locations across the whole input feature map. Specifically, the GRC first divides the channels of the filter into two groups. The grid sampling locations of the first group are shifted to different spatial coordinates across the whole feature map, according to their channel indexes. This can help the convolutional filter capture the global context information. The grid sampling location of the second group remains unchanged to keep the original location information. Convolving using these two groups, the GRC can integrate the global context into the original location information of each pixel for better dense prediction results. With the GRC built in, FCN+ can achieve comparable performance to state-of-the-art methods for semantic segmentation tasks, as verified on PASCAL VOC 2012, Cityscapes, and ADE20K.

updated: Wed Mar 08 2023 14:04:07 GMT+0000 (UTC)

published: Wed Mar 08 2023 14:04:07 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト