Boosting Salient Object Detection with Transformer-based Asymmetric Bilateral U-Net

Yu Qiu; Yun Liu; Le Zhang; Jing Xu

トランスベースの非対称双方向 U-Net による顕著なオブジェクトの検出の強化

既存の顕著物体検出 (SOD) 手法は主に、スキップ接続を備えた U 字型畳み込みニューラルネットワーク (CNN) に依存して、それぞれ顕著な物体の位置を特定し、物体の詳細を調整するために重要なグローバルコンテキストとローカルの空間詳細を組み合わせます。大きな成功にもかかわらず、CNN がグローバルなコンテキストを学習する能力には限界があります。最近、ビジョントランスフォーマーは、グローバルな依存関係の強力なモデリングにより、コンピュータービジョンにおいて革命的な進歩を遂げました。ただし、トランスフォーマーにはローカルな空間表現を学習する機能がないため、トランスフォーマーを SOD に直接適用することは最適とは言えません。この目的を達成するために、この論文では、SOD のグローバル表現とローカル表現の両方を学習するためのトランスフォーマーと CNN の組み合わせを検討します。我々は、変圧器ベースの非対称双方向 U-Net (ABiU-Net) を提案します。非対称バイラテラルエンコーダにはトランスフォーマパスと軽量 CNN パスがあり、この 2 つのパスは各エンコーダステージで通信して、それぞれ相補的なグローバルコンテキストとローカル空間の詳細を学習します。非対称バイラテラルデコーダは、トランスフォーマーからの特徴を処理する 2 つのパスと CNN エンコーダパスで構成され、各デコーダ段階での通信により、粗い顕著なオブジェクトの位置と詳細なオブジェクトの詳細をそれぞれデコードします。 2 つのエンコーダ/デコーダパス間のこのような通信により、AbiU-Net は、トランスフォーマと CNN のそれぞれの本来の利点を活用して、相補的なグローバル表現とローカル表現を学習できるようになります。したがって、ABiU-Net は、トランスベースの SOD に新しい視点を提供します。広範な実験により、ABiU-Net が以前の最先端の SOD 手法に対して有利に機能することが実証されました。コードは https://github.com/yuqiuyuqiu/ABiU-Net で入手できます。

Existing salient object detection (SOD) methods mainly rely on U-shaped convolution neural networks (CNNs) with skip connections to combine the global contexts and local spatial details that are crucial for locating salient objects and refining object details, respectively. Despite great successes, the ability of CNNs in learning global contexts is limited. Recently, the vision transformer has achieved revolutionary progress in computer vision owing to its powerful modeling of global dependencies. However, directly applying the transformer to SOD is suboptimal because the transformer lacks the ability to learn local spatial representations. To this end, this paper explores the combination of transformers and CNNs to learn both global and local representations for SOD. We propose a transformer-based Asymmetric Bilateral U-Net (ABiU-Net). The asymmetric bilateral encoder has a transformer path and a lightweight CNN path, where the two paths communicate at each encoder stage to learn complementary global contexts and local spatial details, respectively. The asymmetric bilateral decoder also consists of two paths to process features from the transformer and CNN encoder paths, with communication at each decoder stage for decoding coarse salient object locations and fine-grained object details, respectively. Such communication between the two encoder/decoder paths enables AbiU-Net to learn complementary global and local representations, taking advantage of the natural merits of transformers and CNNs, respectively. Hence, ABiU-Net provides a new perspective for transformer-based SOD. Extensive experiments demonstrate that ABiU-Net performs favorably against previous state-of-the-art SOD methods. The code is available at https://github.com/yuqiuyuqiu/ABiU-Net.

updated: Mon Aug 21 2023 05:47:52 GMT+0000 (UTC)

published: Tue Aug 17 2021 19:45:28 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト