ShapeConv: Shape-aware Convolutional Layer for Indoor RGB-D Semantic Segmentation

Jinming Cao; Hanchao Leng; Dani Lischinski; Danny Cohen-Or; Changhe Tu; Yangyan Li

ShapeConv：屋内RGB-Dセマンティックセグメンテーションのための形状認識畳み込み層

RGB-Dセマンティックセグメンテーションは、過去数年間でますます注目を集めています。既存の方法は、ほとんどの場合、同種の畳み込み演算子を使用してRGBと深度の特徴を消費し、それらの本質的な違いを無視します。実際、RGB値は、投影された画像空間の測光外観プロパティをキャプチャしますが、深度機能は、ローカルジオメトリの形状と、より大きなコンテキストでのベース（場所）の両方をエンコードします。ベースと比較して、形状はおそらくより固有であり、セマンティクスとの関係が強いため、セグメンテーションの精度にとってより重要です。この観察に触発されて、深度フィーチャを処理するための形状認識畳み込み層（ShapeConv）を導入します。ここで、深度フィーチャは最初に形状コンポーネントとベースコンポーネントに分解され、次の2つの学習可能な重みがそれらと連携するために導入されます。独立して、最後に畳み込みがこれら2つのコンポーネントの再重み付けされた組み合わせに適用されます。 ShapeConvはモデルに依存せず、ほとんどのCNNに簡単に統合して、セマンティックセグメンテーションのバニラ畳み込み層を置き換えることができます。 3つの挑戦的な屋内RGB-Dセマンティックセグメンテーションベンチマーク、つまりNYU-Dv2（-13、-40）、SUN RGB-D、およびSIDに関する広範な実験は、5つの一般的なアーキテクチャでShapeConvを使用した場合の有効性を示しています。さらに、ShapeConvを使用したCNNのパフォーマンスは、推論フェーズで計算やメモリの増加を導入することなく向上します。その理由は、ShapeConvの形状と基本コンポーネント間の重要性のバランスをとるために学習した重みが推論フェーズで一定になるため、次の畳み込みに融合でき、バニラ畳み込み層を持つネットワークと同じネットワークになるためです。

RGB-D semantic segmentation has attracted increasing attention over the past few years. Existing methods mostly employ homogeneous convolution operators to consume the RGB and depth features, ignoring their intrinsic differences. In fact, the RGB values capture the photometric appearance properties in the projected image space, while the depth feature encodes both the shape of a local geometry as well as the base (whereabout) of it in a larger context. Compared with the base, the shape probably is more inherent and has a stronger connection to the semantics, and thus is more critical for segmentation accuracy. Inspired by this observation, we introduce a Shape-aware Convolutional layer (ShapeConv) for processing the depth feature, where the depth feature is firstly decomposed into a shape-component and a base-component, next two learnable weights are introduced to cooperate with them independently, and finally a convolution is applied on the re-weighted combination of these two components. ShapeConv is model-agnostic and can be easily integrated into most CNNs to replace vanilla convolutional layers for semantic segmentation. Extensive experiments on three challenging indoor RGB-D semantic segmentation benchmarks, i.e., NYU-Dv2(-13,-40), SUN RGB-D, and SID, demonstrate the effectiveness of our ShapeConv when employing it over five popular architectures. Moreover, the performance of CNNs with ShapeConv is boosted without introducing any computation and memory increase in the inference phase. The reason is that the learnt weights for balancing the importance between the shape and base components in ShapeConv become constants in the inference phase, and thus can be fused into the following convolution, resulting in a network that is identical to one with vanilla convolutional layers.

updated: Tue Aug 24 2021 05:36:16 GMT+0000 (UTC)

published: Tue Aug 24 2021 05:36:16 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト