Visual Prompt Multi-Modal Tracking

Jiawen Zhu; Simiao Lai; Xin Chen; Dong Wang; Huchuan Lu

ビジュアルプロンプトマルチモーダルトラッキング

可視モーダルオブジェクトトラッキングは、一連のダウンストリームマルチモーダルトラッキング支流を生み出します。基礎モデルの強力な表現を継承するために、マルチモーダルトラッキングの自然な手口は、RGB ベースのパラメーターを完全に微調整することです。効果的ではありますが、この方法は下流データの不足や転送可能性の低さなどのために最適ではありません。この論文では、言語モデルにおけるプロンプト学習の最近の成功に触発されて、Visual Prompt multi-modal Tracking (ViPT) を開発します。これはモーダル関連のプロンプトを学習して、凍結された事前トレーニング済みの基礎モデルをさまざまな下流のマルチモーダル追跡タスクに適応させます。 ViPT は、大規模に事前トレーニングされた RGB ベースのモデルの知識を刺激するためのより良い方法を見つけ、トレーニング可能なパラメーターをいくつか (モデルパラメーターの 1% 未満) だけ導入します。 ViPT は、RGB+Depth、RGB+Thermal、および RGB+Event トラッキングを含む複数のダウンストリームトラッキングタスクで、完全な微調整パラダイムよりも優れています。広範な実験により、マルチモーダル追跡のための視覚プロンプト学習の可能性が示され、ViPT はパラメーター効率を満たしながら最先端のパフォーマンスを達成できます。コードとモデルは https://github.com/jiawen-zhu/ViPT で入手できます。

Visible-modal object tracking gives rise to a series of downstream multi-modal tracking tributaries. To inherit the powerful representations of the foundation model, a natural modus operandi for multi-modal tracking is full fine-tuning on the RGB-based parameters. Albeit effective, this manner is not optimal due to the scarcity of downstream data and poor transferability, etc. In this paper, inspired by the recent success of the prompt learning in language models, we develop Visual Prompt multi-modal Tracking (ViPT), which learns the modal-relevant prompts to adapt the frozen pre-trained foundation model to various downstream multimodal tracking tasks. ViPT finds a better way to stimulate the knowledge of the RGB-based model that is pre-trained at scale, meanwhile only introducing a few trainable parameters (less than 1% of model parameters). ViPT outperforms the full fine-tuning paradigm on multiple downstream tracking tasks including RGB+Depth, RGB+Thermal, and RGB+Event tracking. Extensive experiments show the potential of visual prompt learning for multi-modal tracking, and ViPT can achieve state-of-the-art performance while satisfying parameter efficiency. Code and models are available at https://github.com/jiawen-zhu/ViPT.

updated: Sat Mar 25 2023 02:29:48 GMT+0000 (UTC)

published: Mon Mar 20 2023 01:51:07 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト