A Unified Structure for Efficient RGB and RGB-D Salient Object Detection

Peng Peng; Yong-Jie Li

効率的なRGBおよびRGB-Dの顕著なオブジェクト検出のための統一された構造

顕著な物体検出（SOD）は、特にディープニューラルネットワークを使用して、近年よく研究されています。ただし、RGBおよびRGB-Dイメージを使用したSODは通常、特別に設計する必要がある異なるネットワーク構造を持つ2つの異なるタスクとして扱われます。この論文では、SODの両方のタスクに効率的に対処するために、クロスアテンションコンテキスト抽出（CRACE）モジュールを備えた統一された効率的な構造を提案しました。提案されたCRACEモジュールは、2つ（RGB SODの場合）または3つ（RGB-D SODの場合）の入力を受け取り、適切に融合します。 CRACEモジュールを備えたシンプルな統合機能ピラミッドネットワーク（FPN）のような構造は、顕著性と境界のマルチレベルの監視の下で結果を伝達および改善します。提案された構造はシンプルでありながら効果的です。 RGBと深度の豊富なコンテキスト情報を適切に抽出し、提案された構造によって効率的に融合することができます。実験結果は、私たちの方法が、さまざまなデータセットのRGBおよびRGB-D SODタスクの両方で、ほとんどのメトリックに関して、他の最先端の方法よりも優れていることを示しています。

Salient object detection (SOD) has been well studied in recent years, especially using deep neural networks. However, SOD with RGB and RGB-D images is usually treated as two different tasks with different network structures that need to be designed specifically. In this paper, we proposed a unified and efficient structure with a cross-attention context extraction (CRACE) module to address both tasks of SOD efficiently. The proposed CRACE module receives and appropriately fuses two (for RGB SOD) or three (for RGB-D SOD) inputs. The simple unified feature pyramid network (FPN)-like structure with CRACE modules conveys and refines the results under the multi-level supervisions of saliency and boundaries. The proposed structure is simple yet effective; the rich context information of RGB and depth can be appropriately extracted and fused by the proposed structure efficiently. Experimental results show that our method outperforms other state-of-the-art methods in both RGB and RGB-D SOD tasks on various datasets and in terms of most metrics.

updated: Tue Dec 01 2020 12:12:03 GMT+0000 (UTC)

published: Tue Dec 01 2020 12:12:03 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト