Pixel Difference Convolutional Network for RGB-D Semantic Segmentation

Jun Yang; Lizhi Bai; Yaoru Sun; Chunqi Tian; Maoyu Mao; Guorun Wang

RGB-D セマンティックセグメンテーションのためのピクセル差分畳み込みネットワーク

RGB-D セマンティックセグメンテーションは、深度データが利用できるため、畳み込みニューラルネットワークを使用して進めることができます。 2D の見た目だけでは簡単に判別できないオブジェクトですが、局所的なピクセルの違いや Depth の幾何学模様により、うまく分離できる場合があります。固定グリッドカーネル構造を考慮すると、CNN は詳細できめ細かい情報をキャプチャする能力に欠けるため、正確なピクセルレベルのセマンティックセグメンテーションを実現できません。この問題を解決するために、ピクセル差分畳み込みネットワーク (PDCNet) を提案し、深さデータのローカル範囲と RGB データのグローバル範囲でそれぞれ強度と勾配の両方の情報を集約することにより、詳細な固有パターンをキャプチャします。正確には、PDCNet は Depth ブランチと RGB ブランチで構成されています。 Depth ブランチでは、強度と勾配の両方の情報を集約することで、Depth データの局所的で詳細な幾何学的情報を考慮するために、Pixel Difference Convolution (PDC) を提案します。 RGB ブランチでは、軽量の Cascade Large Kernel (CLK) を提供して PDC、つまり CPDC を拡張し、RGB データのグローバルコンテキストを利用してパフォーマンスをさらに向上させます。その結果、モーダルデータのローカルピクセルとグローバルピクセルの違いの両方が、情報伝播プロセス中にシームレスに PDCNet に組み込まれます。 2 つの挑戦的なベンチマークデータセット、つまり NYUDv2 と SUN RGB-D での実験により、PDCNet がセマンティックセグメンテーションタスクで最先端のパフォーマンスを達成することが明らかになりました。

RGB-D semantic segmentation can be advanced with convolutional neural networks due to the availability of Depth data. Although objects cannot be easily discriminated by just the 2D appearance, with the local pixel difference and geometric patterns in Depth, they can be well separated in some cases. Considering the fixed grid kernel structure, CNNs are limited to lack the ability to capture detailed, fine-grained information and thus cannot achieve accurate pixel-level semantic segmentation. To solve this problem, we propose a Pixel Difference Convolutional Network (PDCNet) to capture detailed intrinsic patterns by aggregating both intensity and gradient information in the local range for Depth data and global range for RGB data, respectively. Precisely, PDCNet consists of a Depth branch and an RGB branch. For the Depth branch, we propose a Pixel Difference Convolution (PDC) to consider local and detailed geometric information in Depth data via aggregating both intensity and gradient information. For the RGB branch, we contribute a lightweight Cascade Large Kernel (CLK) to extend PDC, namely CPDC, to enjoy global contexts for RGB data and further boost performance. Consequently, both modal data's local and global pixel differences are seamlessly incorporated into PDCNet during the information propagation process. Experiments on two challenging benchmark datasets, i.e., NYUDv2 and SUN RGB-D reveal that our PDCNet achieves state-of-the-art performance for the semantic segmentation task.

updated: Thu Feb 23 2023 12:01:22 GMT+0000 (UTC)

published: Thu Feb 23 2023 12:01:22 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト