CNN-based RGB-D Salient Object Detection: Learn, Select and Fuse

Hao Chen; Youfu Li

CNNベースのRGB-D顕著オブジェクト検出：学習、選択、融合

この作業の目標は、RGB-D顕著オブジェクト検出の体系的なソリューションを提示することです。これは、統合フレームワークで次の3つの側面に対処します。モーダル固有表現学習、補完キュー選択、クロスモーダル補完融合。差別的なモーダル固有の機能を学習するには、階層的なクロスモーダル蒸留スキームを提案します。このスキームでは、よく学習されたソースモダリティが、新しいモダリティの学習プロセスを促進する監視信号を提供します。相補的なキューをより適切に抽出するために、残差関数を定式化して、ペアのモダリティからの補数を適応的に組み込みます。さらに、十分なクロスモーダル相互作用とクロスレベル伝送のために、トップダウン融合構造が構築されています。実験結果は、提案されたクロスモーダル蒸留方式のゼロショット顕著性検出と新しいモダリティでの事前トレーニング、およびクロスモーダル/クロスレベル補完の選択と融合の利点を示しています。

The goal of this work is to present a systematic solution for RGB-D salient object detection, which addresses the following three aspects with a unified framework: modal-specific representation learning, complementary cue selection and cross-modal complement fusion. To learn discriminative modal-specific features, we propose a hierarchical cross-modal distillation scheme, in which the well-learned source modality provides supervisory signals to facilitate the learning process for the new modality. To better extract the complementary cues, we formulate a residual function to incorporate complements from the paired modality adaptively. Furthermore, a top-down fusion structure is constructed for sufficient cross-modal interactions and cross-level transmissions. The experimental results demonstrate the effectiveness of the proposed cross-modal distillation scheme in zero-shot saliency detection and pre-training on a new modality, as well as the advantages in selecting and fusing cross-modal/cross-level complements.

updated: Fri Sep 20 2019 03:53:53 GMT+0000 (UTC)

published: Fri Sep 20 2019 03:53:53 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト