CAVER: Cross-Modal View-Mixed Transformer for Bi-Modal Salient Object Detection

Youwei Pang; Xiaoqi Zhao; Lihe Zhang; Huchuan Lu

CAVER：バイモーダルの顕著なオブジェクト検出のためのクロスモーダルビュー混合トランスフォーマー

既存のバイモーダル（RGB-DおよびRGB-T）の顕著なオブジェクト検出方法のほとんどは、畳み込み演算を利用し、複雑な織り交ぜ構造を構築して、クロスモーダル情報統合を実現します。畳み込み演算の固有のローカル接続性は、畳み込みベースのメソッドのパフォーマンスを上限に制限します。この作業では、グローバルな情報の調整と変換の観点からこれらのタスクを再考します。具体的には、提案されたクロスモーダルビュー混合変圧器（CAVER）は、いくつかのクロスモーダル統合ユニットをカスケード接続して、トップダウンの変圧器ベースの情報伝搬パスを構築します。 CAVERは、マルチスケールおよびマルチモーダル機能の統合を、新しいビュー混合アテンションメカニズムに基づいて構築されたシーケンス間のコンテキスト伝播および更新プロセスとして扱います。さらに、入力トークンの数に関する2次の複雑さを考慮して、操作を簡素化するために、パラメーターのないパッチごとのトークン再埋め込み戦略を設計します。 RGB-DおよびRGB-TSODデータセットに関する広範な実験結果は、このような単純な2ストリームエンコーダ-デコーダフレームワークが、提案されたコンポーネントを備えている場合、最近の最先端の方法を超えることができることを示しています。

Most of the existing bi-modal (RGB-D and RGB-T) salient object detection methods utilize the convolution operation and construct complex interweave fusion structures to achieve cross-modal information integration. The inherent local connectivity of the convolution operation constrains the performance of the convolution-based methods to a ceiling. In this work, we rethink these tasks from the perspective of global information alignment and transformation. Specifically, the proposed cross-modal view-mixed transformer (CAVER) cascades several cross-modal integration units to construct a top-down transformer-based information propagation path. CAVER treats the multi-scale and multi-modal feature integration as a sequence-to-sequence context propagation and update process built on a novel view-mixed attention mechanism. Besides, considering the quadratic complexity w.r.t. the number of input tokens, we design a parameter-free patch-wise token re-embedding strategy to simplify operations. Extensive experimental results on RGB-D and RGB-T SOD datasets demonstrate that such a simple two-stream encoder-decoder framework can surpass recent state-of-the-art methods when it is equipped with the proposed components.

updated: Fri Apr 29 2022 13:01:46 GMT+0000 (UTC)

published: Sat Dec 04 2021 15:45:34 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト