Constructing Stronger and Faster Baselines for Skeleton-based Action Recognition

Yi-Fan Song; Zhang Zhang; Caifeng Shan; Liang Wang

スケルトンベースのアクション認識のためのより強力でより高速なベースラインの構築

スケルトンベースのアクション認識における1つの重要な問題は、すべてのスケルトンジョイントにわたって識別機能を抽出する方法です。ただし、このタスクの最近の最先端（SOTA）モデルの複雑さは、非常に洗練され、パラメーターが多すぎる傾向があります。モデルのトレーニングと推論の効率が低いため、大規模なデータセットのモデルアーキテクチャの検証コストが増加しています。上記の問題に対処するために、最近の高度な分離可能な畳み込み層が初期の融合多重入力分岐（MIB）ネットワークに埋め込まれ、スケルトンベースのアクション認識のための効率的なグラフ畳み込みネットワーク（GCN）ベースラインを構築します。さらに、このようなベースラインに基づいて、モデルの幅と深さを同期的に拡張する複合スケーリング戦略を設計し、最終的に、EfficientGCN-Bxと呼ばれる高精度と少量のトレーニング可能なパラメーターを備えた効率的なGCNベースラインのファミリーを取得します。「x」はスケーリング係数を示します。 2つの大規模データセット、つまりNTU RGB + D 60と120では、提案されたEfficientGCN-B4ベースラインは他のSOTAメソッドよりも優れています。たとえば、NTU 60データセットのクロスサブジェクトベンチマークで91.7％の精度を達成し、3.15倍小さくなっています。最高のSOTAメソッドの1つであるMS-G3Dよりも3.21倍高速です。 PyTorchバージョンのソースコードと事前トレーニング済みモデルは、https：//github.com/yfsong0709/EfficientGCNv1で入手できます。

One essential problem in skeleton-based action recognition is how to extract discriminative features over all skeleton joints. However, the complexity of the recent State-Of-The-Art (SOTA) models for this task tends to be exceedingly sophisticated and over-parameterized. The low efficiency in model training and inference has increased the validation costs of model architectures in large-scale datasets. To address the above issue, recent advanced separable convolutional layers are embedded into an early fused Multiple Input Branches (MIB) network, constructing an efficient Graph Convolutional Network (GCN) baseline for skeleton-based action recognition. In addition, based on such the baseline, we design a compound scaling strategy to expand the model's width and depth synchronously, and eventually obtain a family of efficient GCN baselines with high accuracies and small amounts of trainable parameters, termed EfficientGCN-Bx, where ''x'' denotes the scaling coefficient. On two large-scale datasets, i.e., NTU RGB+D 60 and 120, the proposed EfficientGCN-B4 baseline outperforms other SOTA methods, e.g., achieving 91.7% accuracy on the cross-subject benchmark of NTU 60 dataset, while being 3.15x smaller and 3.21x faster than MS-G3D, which is one of the best SOTA methods. The source code in PyTorch version and the pretrained models are available at https://github.com/yfsong0709/EfficientGCNv1.

updated: Tue Jun 29 2021 07:09:11 GMT+0000 (UTC)

published: Tue Jun 29 2021 07:09:11 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト