Feature Shrinkage Pyramid for Camouflaged Object Detection with Transformers

Zhou Huang; Hang Dai; Tian-Zhu Xiang; Shuo Wang; Huai-Xin Chen; Jie Qin; Huan Xiong

Transformer を使用したカモフラージュオブジェクト検出のための機能縮小ピラミッド

ビジョントランスフォーマーは、最近、カモフラージュされたオブジェクトの検出において強力なグローバルコンテキストモデリング機能を示しました。ただし、これらには 2 つの大きな制限があります。効果の低い局所性モデリングと、デコーダーでの不十分な機能集約です。これらは、見分けがつかない背景からの微妙な手がかりを探るカモフラージュされたオブジェクトの検出を助長しません。これらの問題に対処するために、この論文では、カモフラージュされたオブジェクト検出のための漸進的な縮小を通じて、局所性が強化された隣接する変圧器の特徴を階層的にデコードすることを目的とした、新しい変圧器ベースの特徴縮小ピラミッドネットワーク (FSPNet) を提案します。具体的には、非ローカルメカニズムを使用して隣接するトークンを相互作用させ、トークン内のグラフベースの高次関係を探索して、トランスフォーマーのローカル表現を強化する非ローカルトークン強化モジュール (NL-TEM) を提案します。さらに、隣接相互作用モジュール (AIM) を使用して特徴収縮デコーダー (FSD) を設計します。これは、レイヤーごとの収縮ピラミッドを介して隣接するトランス機能を徐々に集約し、オブジェクト情報のデコードのために、知覚できないが効果的な手がかりを可能な限り蓄積します。広範な定量的および定性的な実験により、提案されたモデルが、6 つの広く使用されている評価指標の下で、3 つの困難な COD ベンチマークデータセットで既存の 24 の競合他社よりも大幅に優れていることが実証されています。私たちのコードは、https://github.com/ZhouHuang23/FSPNet で公開されています。

Vision transformers have recently shown strong global context modeling capabilities in camouflaged object detection. However, they suffer from two major limitations: less effective locality modeling and insufficient feature aggregation in decoders, which are not conducive to camouflaged object detection that explores subtle cues from indistinguishable backgrounds. To address these issues, in this paper, we propose a novel transformer-based Feature Shrinkage Pyramid Network (FSPNet), which aims to hierarchically decode locality-enhanced neighboring transformer features through progressive shrinking for camouflaged object detection. Specifically, we propose a nonlocal token enhancement module (NL-TEM) that employs the non-local mechanism to interact neighboring tokens and explore graph-based high-order relations within tokens to enhance local representations of transformers. Moreover, we design a feature shrinkage decoder (FSD) with adjacent interaction modules (AIM), which progressively aggregates adjacent transformer features through a layer-bylayer shrinkage pyramid to accumulate imperceptible but effective cues as much as possible for object information decoding. Extensive quantitative and qualitative experiments demonstrate that the proposed model significantly outperforms the existing 24 competitors on three challenging COD benchmark datasets under six widely-used evaluation metrics. Our code is publicly available at https://github.com/ZhouHuang23/FSPNet.

updated: Sun Mar 26 2023 20:50:58 GMT+0000 (UTC)

published: Sun Mar 26 2023 20:50:58 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト