Skeleton-based Action Recognition via Temporal-Channel Aggregation

Shengqin Wang; Yongji Zhang; Minghao Zhao; Hong Qi; Kai Wang; Fenglin Wei; Yu Jiang

時間チャネル集約によるスケルトンベースのアクション認識

スケルトンベースの動作認識方法は、時空間骨格マップの意味抽出によって制限されます。ただし、現在の方法では、グラフの時間次元と空間次元の両方から特徴を効果的に組み合わせることが困難であり、片側が厚く、反対側が薄い傾向があります。この論文では、スケルトンベースのアクション認識のために、空間的および時間的トポロジーを動的に学習し、さまざまな時間的およびチャネル次元でトポロジー特徴を効率的に集約するために、Temporal-Channel Aggregation Graph Convolutional Networks (TCA-GCN) を提案します。時間集約モジュールを使用して時間次元特徴を学習し、チャネル集約モジュールを使用して、空間動的チャネルごとのトポロジ特徴を時間動的トポロジ特徴と効率的に結合します。さらに、時間モデリングでマルチスケールの骨格特徴を抽出し、注意メカニズムと融合します。広範な実験により、NTU RGB+D、NTU RGB+D 120、および NW-UCLA データセットでは、モデルの結果が最先端の方法よりも優れていることが示されています。

Skeleton-based action recognition methods are limited by the semantic extraction of spatio-temporal skeletal maps. However, current methods have difficulty in effectively combining features from both temporal and spatial graph dimensions and tend to be thick on one side and thin on the other. In this paper, we propose a Temporal-Channel Aggregation Graph Convolutional Networks (TCA-GCN) to learn spatial and temporal topologies dynamically and efficiently aggregate topological features in different temporal and channel dimensions for skeleton-based action recognition. We use the Temporal Aggregation module to learn temporal dimensional features and the Channel Aggregation module to efficiently combine spatial dynamic channel-wise topological features with temporal dynamic topological features. In addition, we extract multi-scale skeletal features on temporal modeling and fuse them with an attention mechanism. Extensive experiments show that our model results outperform state-of-the-art methods on the NTU RGB+D, NTU RGB+D 120, and NW-UCLA datasets.

updated: Mon Aug 08 2022 12:41:25 GMT+0000 (UTC)

published: Tue May 31 2022 16:28:30 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト