RGBT Tracking via Multi-Adapter Network with Hierarchical Divergence Loss

Andong Lu; Chenglong Li; Yuqing Yan; Jin Tang; Bin Luo

階層的発散損失を伴うマルチアダプタネットワークを介したRGBT追跡

RGBおよび熱赤外線データには強力な補完的な利点があり、トラッカーを終日および全天候型で機能させることができるため、RGBT追跡はますます注目を集めています。ただし、視覚追跡のためにRGBTデータを効果的に表現する方法は十分に研究されていません。既存の作品は通常、モダリティ共有またはモダリティ固有の情報の抽出に焦点を当てていますが、これら2つの手がかりの可能性は、RGBT追跡では十分に調査および活用されていません。この論文では、RGBT追跡のためのモダリティ共有、モダリティ固有、インスタンス認識のターゲット表現学習を共同で実行するための新しいマルチアダプタネットワークを提案します。この目的のために、エンドツーエンドのディープラーニングフレームワーク内で3種類のアダプターを設計します。具体的には、修正されたVGG-Mをモダリティ共有ターゲット表現を抽出するための一般性アダプターとして使用します。計算の複雑さを軽減しながらモダリティ固有の機能を抽出するために、一般性に小さなブロックを追加するモダリティアダプターを設計します。各レイヤーと各モダリティのアダプターを並行して使用します。このような設計では、パラメーターの大部分が汎用アダプターと共有されるため、適度な数のパラメーターを使用してマルチレベルのモダリティ固有の表現を学習できます。また、特定のターゲットの外観プロパティと時間的変化をキャプチャするインスタンスアダプタを設計します。さらに、共有および特定の機能を強化するために、複数のカーネルの最大平均不一致の損失を使用して、さまざまなモーダル機能の分布の相違を測定し、それを各レイヤーに統合して、より堅牢な表現学習を実現します。 2つのRGBT追跡ベンチマークデータセットでの広範な実験は、最先端の方法に対する提案されたトラッカーの卓越したパフォーマンスを示しています。

RGBT tracking has attracted increasing attention since RGB and thermal infrared data have strong complementary advantages, which could make trackers all-day and all-weather work. However, how to effectively represent RGBT data for visual tracking remains unstudied well. Existing works usually focus on extracting modality-shared or modality-specific information, but the potentials of these two cues are not well explored and exploited in RGBT tracking. In this paper, we propose a novel multi-adapter network to jointly perform modality-shared, modality-specific and instance-aware target representation learning for RGBT tracking. To this end, we design three kinds of adapters within an end-to-end deep learning framework. In specific, we use the modified VGG-M as the generality adapter to extract the modality-shared target representations.To extract the modality-specific features while reducing the computational complexity, we design a modality adapter, which adds a small block to the generality adapter in each layer and each modality in a parallel manner. Such a design could learn multilevel modality-specific representations with a modest number of parameters as the vast majority of parameters are shared with the generality adapter. We also design instance adapter to capture the appearance properties and temporal variations of a certain target. Moreover, to enhance the shared and specific features, we employ the loss of multiple kernel maximum mean discrepancy to measure the distribution divergence of different modal features and integrate it into each layer for more robust representation learning. Extensive experiments on two RGBT tracking benchmark datasets demonstrate the outstanding performance of the proposed tracker against the state-of-the-art methods.

updated: Fri Jun 04 2021 06:53:08 GMT+0000 (UTC)

published: Sat Nov 14 2020 01:50:46 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト