DECOMPL: Decompositional Learning with Attention Pooling for Group Activity Recognition from a Single Volleyball Image

Berker Demirel; Huseyin Ozkan

DECOMPL: 単一のバレーボール画像からのグループ活動認識のための注意プーリングによる分解学習

グループアクティビティ認識 (GAR) は、シーン内で複数のアクターによって実行されるアクティビティを検出することを目的としています。以前の作品は、RGB、オプティカルフロー、またはキーポイントデータ型に基づいて時空間機能をモデル化します。ただし、一時性とこれらのデータ型の両方を使用すると、計算が大幅に複雑になります。私たちの仮説は、一時性のない RGB データのみを使用することで、パフォーマンスを維持し、精度をほとんど低下させないというものです。そのために、バレーボールビデオの新しい GAR 手法である DECOMPL を提案します。これは、2 つの補完的なブランチで構成されます。ビジュアルブランチでは、アテンションプーリングを使用して特徴を選択的に抽出します。座標ブランチでは、アクターの現在の構成を考慮し、ボックス座標から空間情報を抽出します。さらに、最近の文献が主に基づいているバレーボールのデータセットを分析し、そのラベリング方式が活動におけるグループの概念を個々のアクターのレベルにまで低下させることに気付きました。グループの概念を強調するために、体系的な方法でデータセットに手動で再注釈を付けました。バレーボールと集団活動 (別のドメイン、つまりバレーボールではない) のデータセットに関する実験結果は、提案されたモデル DECOMPL の有効性を実証しました。これは、比較可能な状態の中で、再注釈/元の注釈を使用して最高/次善の GAR パフォーマンスを実現しました-最先端の技術。私たちのコード、結果、および新しい注釈は、改訂プロセス後に GitHub を通じて利用できるようになります。

Group Activity Recognition (GAR) aims to detect the activity performed by multiple actors in a scene. Prior works model the spatio-temporal features based on the RGB, optical flow or keypoint data types. However, using both the temporality and these data types altogether increase the computational complexity significantly. Our hypothesis is that by only using the RGB data without temporality, the performance can be maintained with a negligible loss in accuracy. To that end, we propose a novel GAR technique for volleyball videos, DECOMPL, which consists of two complementary branches. In the visual branch, it extracts the features using attention pooling in a selective way. In the coordinate branch, it considers the current configuration of the actors and extracts the spatial information from the box coordinates. Moreover, we analyzed the Volleyball dataset that the recent literature is mostly based on, and realized that its labeling scheme degrades the group concept in the activities to the level of individual actors. We manually reannotated the dataset in a systematic manner for emphasizing the group concept. Experimental results on the Volleyball as well as Collective Activity (from another domain, i.e., not volleyball) datasets demonstrated the effectiveness of the proposed model DECOMPL, which delivered the best/second best GAR performance with the reannotations/original annotations among the comparable state-of-the-art techniques. Our code, results and new annotations will be made available through GitHub after the revision process.

updated: Sat Mar 11 2023 16:30:51 GMT+0000 (UTC)

published: Sat Mar 11 2023 16:30:51 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト