MFGNet: Dynamic Modality-Aware Filter Generation for RGB-T Tracking

Xiao Wang; Xiujun Shu; Shiliang Zhang; Bo Jiang; Yaowei Wang; Yonghong Tian; Feng Wu

MFGNet：RGB-Tトラッキング用の動的モダリティ対応フィルター生成

多くのRGB-Tトラッカーは、アダプティブ重み付け方式（またはアテンションメカニズム）を利用して、堅牢な機能表現を実現しようとします。これらの作業とは異なり、実際の追跡でさまざまな入力画像の畳み込みカーネルを適応的に調整することにより、可視データと熱データ間のメッセージ通信を強化する新しい動的モダリティ対応フィルター生成モジュール（MFGNetという名前）を提案します。入力として画像ペアが与えられた場合、最初にそれらの特徴をバックボーンネットワークでエンコードします。次に、これらの機能マップを連結し、2つの独立したネットワークを使用して動的なモダリティ対応フィルターを生成します。可視フィルターと熱フィルターは、それぞれ対応する入力特徴マップに対して動的畳み込み演算を実行するために使用されます。残留接続に触発されて、生成された可視および熱特徴マップの両方が入力特徴マップで要約されます。拡張機能マップはRoI整列モジュールに供給され、後続の分類のためにインスタンスレベルの機能を生成します。重度のオクルージョン、高速モーション、および視界外れによって引き起こされる問題に対処するために、新しい方向認識ターゲット駆動型注意メカニズムを活用して、ローカルおよびグローバルの共同検索を実行することを提案します。空間的および時間的リカレントニューラルネットワークは、正確なグローバルアテンション予測のために方向認識コンテキストをキャプチャするために使用されます。 3つの大規模なRGB-T追跡ベンチマークデータセットでの広範な実験により、提案されたアルゴリズムの有効性が検証されました。このペーパーのソースコードは、magentahttps：//github.com/wangxiao5791509/MFG_RGBT_Tracking_PyTorchで入手できます。

Many RGB-T trackers attempt to attain robust feature representation by utilizing an adaptive weighting scheme (or attention mechanism). Different from these works, we propose a new dynamic modality-aware filter generation module (named MFGNet) to boost the message communication between visible and thermal data by adaptively adjusting the convolutional kernels for various input images in practical tracking. Given the image pairs as input, we first encode their features with the backbone network. Then, we concatenate these feature maps and generate dynamic modality-aware filters with two independent networks. The visible and thermal filters will be used to conduct a dynamic convolutional operation on their corresponding input feature maps respectively. Inspired by residual connection, both the generated visible and thermal feature maps will be summarized with input feature maps. The augmented feature maps will be fed into the RoI align module to generate instance-level features for subsequent classification. To address issues caused by heavy occlusion, fast motion and out-of-view, we propose to conduct a joint local and global search by exploiting a new direction-aware target driven attention mechanism. The spatial and temporal recurrent neural network is used to capture the direction-aware context for accurate global attention prediction. Extensive experiments on three large-scale RGB-T tracking benchmark datasets validated the effectiveness of our proposed algorithm. The source code of this paper is available at magentahttps://github.com/wangxiao5791509/MFG_RGBT_Tracking_PyTorch.

updated: Mon May 09 2022 11:06:22 GMT+0000 (UTC)

published: Thu Jul 22 2021 03:10:51 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト