SwinFuse: A Residual Swin Transformer Fusion Network for Infrared and Visible Images

Zhishe Wang; Yanlin Chen; Wenyu Shao; Hui Li; Lei Zhang

SwinFuse：赤外線および可視画像用の残留Swin Transformer Fusion Network

既存の深層学習融合法は、主に畳み込みニューラルネットワークに集中しており、トランスフォーマーを使用した試みはほとんどありません。一方、畳み込み演算は、画像と畳み込みカーネル間のコンテンツに依存しない相互作用であり、いくつかの重要なコンテキストを失い、融合パフォーマンスをさらに制限する可能性があります。この目的に向けて、赤外線画像と可視画像のシンプルで強力な融合ベースライン、つまりSwinFuseと呼ばれるResidual Swin TransformerFusionNetworkを紹介します。 SwinFuseには、グローバル特徴抽出、融合レイヤー、特徴再構築の3つの部分が含まれています。特に、完全に注意を向ける機能エンコーディングバックボーンを構築して、長距離依存関係をモデル化します。これは、純粋なトランスネットワークであり、畳み込みニューラルネットワークと比較してより強力な表現能力を備えています。さらに、シーケンス行列のL_1ノルムに基づいて新しい機能融合戦略を設計し、行と列のベクトル次元から対応するアクティビティレベルを測定します。これにより、競合する赤外線の明るさと明確な可視の詳細を十分に保持できます。最後に、主観的な観察と客観的な比較を通じて、3つの異なるデータセットで9つの最先端の伝統的および深層学習方法を使用してSwinFuseを証明します。実験結果は、提案されたSwinFuseが強力な一般化能力と競争力を備えた驚くべき融合パフォーマンスを実現することを示しています。計算効率。コードはhttps://github.com/Zhishe-Wang/SwinFuseで入手できます。

The existing deep learning fusion methods mainly concentrate on the convolutional neural networks, and few attempts are made with transformer. Meanwhile, the convolutional operation is a content-independent interaction between the image and convolution kernel, which may lose some important contexts and further limit fusion performance. Towards this end, we present a simple and strong fusion baseline for infrared and visible images, namely Residual Swin Transformer Fusion Network, termed as SwinFuse. Our SwinFuse includes three parts: the global feature extraction, fusion layer and feature reconstruction. In particular, we build a fully attentional feature encoding backbone to model the long-range dependency, which is a pure transformer network and has a stronger representation ability compared with the convolutional neural networks. Moreover, we design a novel feature fusion strategy based on L_1-norm for sequence matrices, and measure the corresponding activity levels from row and column vector dimensions, which can well retain competitive infrared brightness and distinct visible details. Finally, we testify our SwinFuse with nine state-of-the-art traditional and deep learning methods on three different datasets through subjective observations and objective comparisons, and the experimental results manifest that the proposed SwinFuse obtains surprising fusion performance with strong generalization ability and competitive computational efficiency. The code will be available at https://github.com/Zhishe-Wang/SwinFuse.

updated: Mon Apr 25 2022 05:04:19 GMT+0000 (UTC)

published: Mon Apr 25 2022 05:04:19 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト