Compositional Prompt Tuning with Motion Cues for Open-vocabulary Video Relation Detection

Kaifeng Gao; Long Chen; Hanwang Zhang; Jun Xiao; Qianru Sun

オープン語彙ビデオ関係検出のためのモーションキューを使用した合成プロンプトチューニング

大規模な事前トレーニング済みのビジョン言語モデルを使用した迅速な調整により、オブジェクトの分類や検出など、限られた基本カテゴリでトレーニングされたオープン語彙予測が強化されます。このホワイトペーパーでは、モーションキューを使用した合成プロンプトチューニングを提案します。これは、ビデオデータの合成予測のための拡張プロンプトチューニングパラダイムです。特に、Open-vocabulary Video Visual Relation Detection (Open-VidVRD) の Relation Prompt (RePro) を提示します。この場合、従来のプロンプトチューニングは、特定の主題とオブジェクトの組み合わせとモーションパターンに簡単に偏ります。この目的のために、RePro は Open-VidVRD の 2 つの技術的課題に対処します。1) プロンプトトークンは、サブジェクトとオブジェクトの 2 つの異なるセマンティックな役割を尊重する必要があります。2) チューニングは、サブジェクトの多様な時空間モーションパターンを考慮する必要があります。 -オブジェクト構成。付加機能がなければ、当社の RePro は、基本トレーニングオブジェクトと述語カテゴリだけでなく、目に見えないカテゴリの 2 つの VidVRD ベンチマークで新しい最先端のパフォーマンスを達成します。広範なアブレーションは、提案された構成的およびマルチモードのプロンプト設計の有効性も示しています。コードは https://github.com/Dawn-LX/OpenVoc-VidVRD で入手できます。

Prompt tuning with large-scale pretrained vision-language models empowers open-vocabulary predictions trained on limited base categories, e.g., object classification and detection. In this paper, we propose compositional prompt tuning with motion cues: an extended prompt tuning paradigm for compositional predictions of video data. In particular, we present Relation Prompt (RePro) for Open-vocabulary Video Visual Relation Detection (Open-VidVRD), where conventional prompt tuning is easily biased to certain subject-object combinations and motion patterns. To this end, RePro addresses the two technical challenges of Open-VidVRD: 1) the prompt tokens should respect the two different semantic roles of subject and object, and 2) the tuning should account for the diverse spatio-temporal motion patterns of the subject-object compositions. Without bells and whistles, our RePro achieves a new state-of-the-art performance on two VidVRD benchmarks of not only the base training object and predicate categories, but also the unseen ones. Extensive ablations also demonstrate the effectiveness of the proposed compositional and multi-mode design of prompts. Code is available at https://github.com/Dawn-LX/OpenVoc-VidVRD.

updated: Wed Feb 01 2023 06:20:54 GMT+0000 (UTC)

published: Wed Feb 01 2023 06:20:54 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト