Skeleton-based Action Recognition via Temporal-Channel Aggregation

Shengqin Wang; Yongji Zhang; Fenglin Wei; Kai Wang; Minghao Zhao; Yu Jiang

時間チャネル集約によるスケルトンベースの行動認識

スケルトンベースの行動認識方法は、時空間骨格マップのセマンティック抽出によって制限されます。ただし、現在の方法では、時間的グラフと空間的グラフの両方の次元の特徴を効果的に組み合わせることが困難であり、一方が厚く、もう一方が薄い傾向があります。この論文では、スケルトンベースの行動認識のために、空間的および時間的トポロジーを動的かつ効率的に集約するために、時間的チャネル集約グラフ畳み込みネットワーク（TCA-GCN）を提案します。時間的集約モジュールを使用して時間的次元の特徴を学習し、チャネル集約モジュールを使用して、チャネルワイズを使用して学習した空間的動的トポロジー的特徴を時間的動的トポロジー的特徴と効率的に組み合わせます。さらに、時間モデリングでマルチスケールの骨格の特徴を抽出し、注意メカニズムを備えた先験的な骨格の知識とそれらを融合します。広範な実験により、私たちのモデルの結果は、NTU RGB + D、NTU RGB + D 120、およびNW-UCLAデータセットで最先端の方法よりも優れていることが示されています。

Skeleton-based action recognition methods are limited by the semantic extraction of spatio-temporal skeletal maps. However, current methods have difficulty in effectively combining features from both temporal and spatial graph dimensions and tend to be thick on one side and thin on the other. In this paper, we propose a Temporal-Channel Aggregation Graph Convolutional Networks (TCA-GCN) to learn spatial and temporal topologies dynamically and efficiently aggregate topological features in different temporal and channel dimensions for skeleton-based action recognition. We use the Temporal Aggregation module to learn temporal dimensional features and the Channel Aggregation module to efficiently combine spatial dynamic topological features learned using Channel-wise with temporal dynamic topological features. In addition, we extract multi-scale skeletal features on temporal modeling and fuse them with priori skeletal knowledge with an attention mechanism. Extensive experiments show that our model results outperform state-of-the-art methods on the NTU RGB+D, NTU RGB+D 120, and NW-UCLA datasets.

updated: Tue May 31 2022 16:28:30 GMT+0000 (UTC)

published: Tue May 31 2022 16:28:30 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト