Rethinking Atrous Convolution for Semantic Image Segmentation

Liang-Chieh Chen; George Papandreou; Florian Schroff; Hartwig Adam

セマンティック画像セグメンテーションのための膨張コンボリューションの再考

本研究では、深い畳み込みニューラルネットワークによって計算された特徴応答の解像度を制御するとともに、フィルタの視野を明示的に調整するための強力なツールである膨張畳み込みを、セマンティック画像セグメンテーションに適用して再検討する。本研究では、複数のスケールでの対象物のセグメンテーション問題に対応するために、複数の膨張レートを採用してマルチスケールのコンテキストを捉えるために、膨張畳み込みをカスケードまたは並列に採用するモジュールを設計する。さらに、我々が提案している、複数のスケールでの畳み込み特徴をプローブする膨張空間ピラミッドプーリングモジュールを、大域的なコンテキストを符号化する画像レベルの特徴で補強し、性能をさらに向上させることを提案する。また、実装の詳細を詳しく説明し、システムのトレーニングを行った経験を共有する。提案した「DeepLabv3」システムは、DenseCRF後処理を行わない以前のDeepLabバージョンよりも大幅に改善され、PASCAL VOC 2012のセマンティック画像セグメンテーションベンチマークにおいて、他の最新モデルと同等の性能を達成した。

In this work, we revisit atrous convolution, a powerful tool to explicitly adjust filter's field-of-view as well as control the resolution of feature responses computed by Deep Convolutional Neural Networks, in the application of semantic image segmentation. To handle the problem of segmenting objects at multiple scales, we design modules which employ atrous convolution in cascade or in parallel to capture multi-scale context by adopting multiple atrous rates. Furthermore, we propose to augment our previously proposed Atrous Spatial Pyramid Pooling module, which probes convolutional features at multiple scales, with image-level features encoding global context and further boost performance. We also elaborate on implementation details and share our experience on training our system. The proposed `DeepLabv3' system significantly improves over our previous DeepLab versions without DenseCRF post-processing and attains comparable performance with other state-of-art models on the PASCAL VOC 2012 semantic image segmentation benchmark.

updated: Tue Dec 05 2017 18:06:21 GMT+0000 (UTC)

published: Sat Jun 17 2017 22:48:57 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト