Real-World Graph Convolution Networks (RW-GCNs) for Action Recognition in Smart Video Surveillance

Justin Sanchez; Christopher Neff; Hamed Tabkhi

スマートビデオ監視における行動認識のための実世界グラフ畳み込みネットワーク（RW-GCN）

アクション認識は、最新のスマートビデオ監視およびセキュリティシステムの重要なアルゴリズム部分です。スケルトンベースの行動認識は、RGBピクセルデータを使用する代わりに、人間のポーズ情報に依存して適切な行動を分類する魅力的なアプローチです。ただし、既存のアルゴリズムは、ノイズの多い入力、遅延要件、エッジリソースの制約など、実際の制限を表していない理想的な条件を想定していることがよくあります。既存のアプローチの制限に対処するために、このペーパーでは、実世界のスケルトンベースのアクション認識のドメイン制約を満たすためのアーキテクチャレベルのソリューションである実世界グラフ畳み込みネットワーク（RW-GCN）を紹介します。 RW-GCNは、人間の視覚野にフィードバック接続が存在することに触発され、既存の近最先端（SotA）の時空間グラフ畳み込みネットワーク（ST-GCN）で注意深いフィードバック拡張を活用します。 ST-GCNの設計上の選択は、情報理論中心の原則に基づいており、エンドツーエンドのリアルタイムおよび最先端のスマートビデオシステムで通常発生する空間ノイズと時間ノイズの両方に対処します。私たちの結果は、NTU-RGB-D-120データセットで94.1％の新しいSotA精度を達成し、ベースラインST-GCNアプリケーションの32分の1の遅延を達成しながら、90.4％の精度を達成することにより、これらのアプリケーションにサービスを提供するRW-GCNの能力を示しています。空間キーポイントノイズが存在する場合の北西部UCLAデータセット。 RW-GCNは、リソースに制約のあるデバイスでスループットの範囲（15.6〜5.5アクション/秒）を維持しながら、10倍の費用対効果の高いNVIDIA Jetson Nano（NVIDIA Xavier NXとは対照的）で実行することにより、システムのスケーラビリティをさらに示します。コードはhttps://github.com/TeCSAR-UNCC/RW-GCNから入手できます。

Action recognition is a key algorithmic part of emerging on-the-edge smart video surveillance and security systems. Skeleton-based action recognition is an attractive approach which, instead of using RGB pixel data, relies on human pose information to classify appropriate actions. However, existing algorithms often assume ideal conditions that are not representative of real-world limitations, such as noisy input, latency requirements, and edge resource constraints. To address the limitations of existing approaches, this paper presents Real-World Graph Convolution Networks (RW-GCNs), an architecture-level solution for meeting the domain constraints of Real World Skeleton-based Action Recognition. Inspired by the presence of feedback connections in the human visual cortex, RW-GCNs leverage attentive feedback augmentation on existing near state-of-the-art (SotA) Spatial-Temporal Graph Convolution Networks (ST-GCNs). The ST-GCNs' design choices are derived from information theory-centric principles to address both the spatial and temporal noise typically encountered in end-to-end real-time and on-the-edge smart video systems. Our results demonstrate RW-GCNs' ability to serve these applications by achieving a new SotA accuracy on the NTU-RGB-D-120 dataset at 94.1%, and achieving 32X less latency than baseline ST-GCN applications while still achieving 90.4% accuracy on the Northwestern UCLA dataset in the presence of spatial keypoint noise. RW-GCNs further show system scalability by running on the 10X cost effective NVIDIA Jetson Nano (as opposed to NVIDIA Xavier NX), while still maintaining a respectful range of throughput (15.6 to 5.5 Actions per Second) on the resource constrained device. The code is available here: https://github.com/TeCSAR-UNCC/RW-GCN.

updated: Sat Jan 15 2022 02:29:36 GMT+0000 (UTC)

published: Sat Jan 15 2022 02:29:36 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト