TSGCNeXt: Dynamic-Static Multi-Graph Convolution for Efficient Skeleton-Based Action Recognition with Long-term Learning Potential

Dongjingdin Liu; Pengpeng Chen; Miao Yao; Yijing Lu; Zijie Cai; Yuxin Tian

TSGCNeXt: 長期学習の可能性を備えた効率的なスケルトンベースのアクション認識のための動的静的マルチグラフ畳み込み

スケルトンベースの行動認識は、グラフ畳み込みネットワーク (GCN) の開発により、人間の行動認識において顕著な成果を上げています。ただし、最近の研究では、冗長なトレーニングを使用して複雑な学習メカニズムを構築する傾向があり、長い時系列のボトルネックが存在します。これらの問題を解決するために、Temporal-Spatio Graph ConvNeXt (TSGCNeXt) を提案して、長い時間的スケルトンシーケンスの効率的な学習メカニズムを調べます。最初に、単純な構造を持つ新しいグラフ学習メカニズムである動的静的個別マルチグラフ畳み込み (DS-SMG) が提案され、複数の独立したトポロジカルグラフの機能を集約し、動的畳み込み中にノード情報が無視されるのを回避します。次に、55.08% のスピードアップで動的グラフ学習の逆伝播計算を最適化するグラフ畳み込みトレーニング加速メカニズムを構築します。最後に、TSGCNeXt は、GCN の全体構造を 3 つの時空間学習モジュールで再構築し、長い時間的特徴を効率的にモデル化します。大規模なデータセット NTU RGB+D 60 および 120 での既存の以前の方法と比較して、TSGCNeXt は単一ストリームネットワークで優れています。さらに、マルチストリームフュージョンに導入された ema モデルにより、TSGCNeXt は SOTA レベルを達成します。 NTU 120 のクロスサブジェクトとクロスセットでは、精度は 90.22% と 91.74% に達します。

Skeleton-based action recognition has achieved remarkable results in human action recognition with the development of graph convolutional networks (GCNs). However, the recent works tend to construct complex learning mechanisms with redundant training and exist a bottleneck for long time-series. To solve these problems, we propose the Temporal-Spatio Graph ConvNeXt (TSGCNeXt) to explore efficient learning mechanism of long temporal skeleton sequences. Firstly, a new graph learning mechanism with simple structure, Dynamic-Static Separate Multi-graph Convolution (DS-SMG) is proposed to aggregate features of multiple independent topological graphs and avoid the node information being ignored during dynamic convolution. Next, we construct a graph convolution training acceleration mechanism to optimize the back-propagation computing of dynamic graph learning with 55.08% speed-up. Finally, the TSGCNeXt restructure the overall structure of GCN with three Spatio-temporal learning modules,efficiently modeling long temporal features. In comparison with existing previous methods on large-scale datasets NTU RGB+D 60 and 120, TSGCNeXt outperforms on single-stream networks. In addition, with the ema model introduced into the multi-stream fusion, TSGCNeXt achieves SOTA levels. On the cross-subject and cross-set of the NTU 120, accuracies reach 90.22% and 91.74%.

updated: Sun Apr 23 2023 12:10:36 GMT+0000 (UTC)

published: Sun Apr 23 2023 12:10:36 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト