MFFN: Multi-view Feature Fusion Network for Camouflaged Object Detection

Dehua Zheng; Xiaochen Zheng; Laurence T. Yang; Yuan Gao; Chenlu Zhu; Yiheng Ruan

MFFN: カモフラージュオブジェクト検出のためのマルチビュー機能融合ネットワーク

カモフラージュオブジェクト検出 (COD) に関する最近の研究は、複雑な環境に隠されている非常に隠蔽されたオブジェクトをセグメント化することを目的としています。小さくてぼやけたカモフラージュされたオブジェクトは、視覚的に区別できない特性をもたらします。ただし、現在の単一ビューの COD 検出器は、背景の注意散漫に敏感です。したがって、カモフラージュされたオブジェクトのぼやけた境界とさまざまな形状を、単一ビューの検出器で完全にキャプチャすることは困難です。これらの障害を克服するために、マルチビュー機能融合ネットワーク (MFFN) と呼ばれる行動に着想を得たフレームワークを提案します。これは、画像内の不明瞭なオブジェクトを見つける、つまり複数の角度、距離、視点から観察するという人間の行動を模倣します。具体的には、データ拡張によって複数の観察方法（マルチビュー）を生成し、それらを入力として適用することが重要なアイデアです。 MFFN は、抽出されたマルチビューフィーチャを比較および融合することにより、重要な境界とセマンティック情報をキャプチャします。さらに、MFFN は、ビューとチャネル間の依存関係と相互作用を利用します。具体的には、マルチビューの共注意（CAMV）と呼ばれる2段階の注意モジュールを通じて、異なるビュー間の補完的な情報を活用します。そして、Channel Fusion Unit (CFU) と呼ばれるローカル全体モジュールを設計して、さまざまな機能マップのチャネルごとのコンテキストの手がかりを反復的に探索します。実験結果は、同じデータを使用したトレーニングにより、既存の最先端の方法に対して、私たちの方法が有利に機能することを示しています。コードは https://github.com/dwardzheng/MFFN_COD で入手できます。

Recent research about camouflaged object detection (COD) aims to segment highly concealed objects hidden in complex surroundings. The tiny, fuzzy camouflaged objects result in visually indistinguishable properties. However, current single-view COD detectors are sensitive to background distractors. Therefore, blurred boundaries and variable shapes of the camouflaged objects are challenging to be fully captured with a single-view detector. To overcome these obstacles, we propose a behavior-inspired framework, called Multi-view Feature Fusion Network (MFFN), which mimics the human behaviors of finding indistinct objects in images, i.e., observing from multiple angles, distances, perspectives. Specifically, the key idea behind it is to generate multiple ways of observation (multi-view) by data augmentation and apply them as inputs. MFFN captures critical boundary and semantic information by comparing and fusing extracted multi-view features. In addition, our MFFN exploits the dependence and interaction between views and channels. Specifically, our methods leverage the complementary information between different views through a two-stage attention module called Co-attention of Multi-view (CAMV). And we design a local-overall module called Channel Fusion Unit (CFU) to explore the channel-wise contextual clues of diverse feature maps in an iterative manner. The experiment results show that our method performs favorably against existing state-of-the-art methods via training with the same data. The code will be available at https://github.com/dwardzheng/MFFN_COD.

updated: Wed Oct 19 2022 17:08:16 GMT+0000 (UTC)

published: Wed Oct 12 2022 16:12:58 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト