Interaction-Aware Prompting for Zero-Shot Spatio-Temporal Action Detection

Wei-Jhe Huang; Jheng-Hsien Yeh; Gueter Josmy Faure; Min-Hung Chen; Shang-Hong Lai

ゼロショットの時空間アクション検出のためのインタラクション認識プロンプト

時空間アクション検出の目標は、ビデオ内で各人のアクションが発生する時間と場所を特定し、対応するアクションカテゴリを分類することです。既存の方法のほとんどは、大量のトレーニングデータを必要とする完全教師あり学習を採用しているため、ゼロショット学習を実現することは非常に困難です。この論文では、事前にトレーニングされた視覚言語モデルを利用して代表的な画像とテキストの特徴を抽出し、さまざまな対話モジュールを介してこれらの特徴間の関係をモデル化して対話機能を取得することを提案します。さらに、この機能を使用して、各ラベルに、より適切なテキスト機能を取得するよう促します。最後に、各ラベルのインタラクション機能とテキスト機能の間の類似性を計算して、アクションカテゴリを決定します。 J-HMDB および UCF101-24 データセットに関する実験では、提案されたインタラクションモジュールとプロンプトによって、視覚言語機能がより適切に調整され、ゼロショット時空間アクション検出の優れた精度が達成されることが実証されました。コードは承認後にリリースされます。

The goal of spatial-temporal action detection is to determine the time and place where each person's action occurs in a video and classify the corresponding action category. Most of the existing methods adopt fully-supervised learning, which requires a large amount of training data, making it very difficult to achieve zero-shot learning. In this paper, we propose to utilize a pre-trained visual-language model to extract the representative image and text features, and model the relationship between these features through different interaction modules to obtain the interaction feature. In addition, we use this feature to prompt each label to obtain more appropriate text features. Finally, we calculate the similarity between the interaction feature and the text feature for each label to determine the action category. Our experiments on J-HMDB and UCF101-24 datasets demonstrate that the proposed interaction module and prompting make the visual-language features better aligned, thus achieving excellent accuracy for zero-shot spatio-temporal action detection. The code will be released upon acceptance.

updated: Mon Apr 10 2023 16:08:59 GMT+0000 (UTC)

published: Mon Apr 10 2023 16:08:59 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト