Topology-aware Convolutional Neural Network for Efficient Skeleton-based Action Recognition

Kailin Xu; Fanfan Ye; Qiaoyong Zhong; Di Xie

効率的なスケルトンベースのアクション認識のためのトポロジー認識畳み込みニューラルネットワーク

スケルトンベースのアクション認識のコンテキストでは、グラフ畳み込みネットワーク（GCN）が急速に開発されましたが、畳み込みニューラルネットワーク（CNN）はあまり注目されていません。 1つの理由は、CNNが不規則なスケルトントポロジのモデリングに不十分であると見なされていることです。この制限を緩和するために、このホワイトペーパーではトポロジ対応CNN（Ta-CNN）という名前の純粋なCNNアーキテクチャを提案します。特に、map-attend-group-map操作の組み合わせである新しいクロスチャネル機能拡張モジュールを開発します。その後、モジュールを座標レベルとジョイントレベルに適用することにより、トポロジ機能が効果的に強化されます。特に、ジョイントの次元がチャネルとして扱われる場合、グラフの畳み込みが通常の畳み込みの特殊なケースであることを理論的に証明します。これにより、GCNのトポロジモデリング機能もCNNを使用して実装できることが確認されます。さらに、2人を独自の方法で混合し、パフォーマンスをさらに向上させるSkeletonMix戦略を創造的に設計します。 Ta-CNNの有効性を検証するために、広く使用されている4つのデータセット、つまりN-UCLA、SBU、NTU RGB + D、およびNTU RGB + D120で広範な実験が行われます。既存のCNNベースの方法を大幅に上回っています。主要なGCNベースの方法と比較して、必要なGFLOPとパラメーターに関して、はるかに少ない複雑さで同等のパフォーマンスを実現します。

In the context of skeleton-based action recognition, graph convolutional networks (GCNs) have been rapidly developed, whereas convolutional neural networks (CNNs) have received less attention. One reason is that CNNs are considered poor in modeling the irregular skeleton topology. To alleviate this limitation, we propose a pure CNN architecture named Topology-aware CNN (Ta-CNN) in this paper. In particular, we develop a novel cross-channel feature augmentation module, which is a combo of map-attend-group-map operations. By applying the module to the coordinate level and the joint level subsequently, the topology feature is effectively enhanced. Notably, we theoretically prove that graph convolution is a special case of normal convolution when the joint dimension is treated as channels. This confirms that the topology modeling power of GCNs can also be implemented by using a CNN. Moreover, we creatively design a SkeletonMix strategy which mixes two persons in a unique manner and further boosts the performance. Extensive experiments are conducted on four widely used datasets, i.e. N-UCLA, SBU, NTU RGB+D and NTU RGB+D 120 to verify the effectiveness of Ta-CNN. We surpass existing CNN-based methods significantly. Compared with leading GCN-based methods, we achieve comparable performance with much less complexity in terms of the required GFLOPs and parameters.

updated: Thu Dec 09 2021 02:42:44 GMT+0000 (UTC)

published: Wed Dec 08 2021 09:02:50 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト