Real-time Human Action Recognition Using Locally Aggregated Kinematic-Guided Skeletonlet and Supervised Hashing-by-Analysis Model

Bin Sun; Dehui Kong; Shaofan Wang; Lichun Wang; Baocai Yin

局所的に集約された運動学的誘導スケルトンレットと監視された分析によるハッシュモデルを使用したリアルタイムのヒューマンアクション認識

3Dアクション認識は、3Dスケルトンジョイントで構成されるアクションシーケンスの分類と呼ばれます。多くの研究が3Dアクション認識に向けられていますが、主に3つの問題があります。それは、非常に複雑なアーティキュレーション、大量のノイズ、および低い実装効率です。これらすべての問題に取り組むために、局所的に集約された運動学的誘導スケルトンレット（LAKS）を教師あり分析によるハッシュ（SHA）モデルと統合することにより、リアルタイムの3Dアクション認識フレームワークを提案します。最初に、スケルトンレットを運動学的原理の観点からグループ化されたジョイントオフセットのいくつかの組み合わせとして定義し、次に、ノイズ除去フェーズとローカル集約フェーズで構成されるLAKSを使用してアクションシーケンスを表します。ノイズ除去フェーズでは、ノイズの多いアクションデータを検出し、その中のすべての機能を対応する前のフレームの機能に置き換えることで調整します。一方、ローカル集約フェーズでは、スケルトンレットのオフセット機能とそのクラスター中心の差をすべてのシーケンスのオフセット機能。最後に、スパース表現とハッシュモデルを組み合わせたSHAモデルは、高効率を維持しながら認識精度の向上を目指しています。 MSRAction3D、UTKinectAction3D、Florence3DActionデータセットの実験結果は、提案された方法が認識精度と実装効率の両方で最先端の方法よりも優れていることを示しています。

3D action recognition is referred to as the classification of action sequences which consist of 3D skeleton joints. While many research work are devoted to 3D action recognition, it mainly suffers from three problems: highly complicated articulation, a great amount of noise, and a low implementation efficiency. To tackle all these problems, we propose a real-time 3D action recognition framework by integrating the locally aggregated kinematic-guided skeletonlet (LAKS) with a supervised hashing-by-analysis (SHA) model. We first define the skeletonlet as a few combinations of joint offsets grouped in terms of kinematic principle, and then represent an action sequence using LAKS, which consists of a denoising phase and a locally aggregating phase. The denoising phase detects the noisy action data and adjust it by replacing all the features within it with the features of the corresponding previous frame, while the locally aggregating phase sums the difference between an offset feature of the skeletonlet and its cluster center together over all the offset features of the sequence. Finally, the SHA model which combines sparse representation with a hashing model, aiming at promoting the recognition accuracy while maintaining a high efficiency. Experimental results on MSRAction3D, UTKinectAction3D and Florence3DAction datasets demonstrate that the proposed method outperforms state-of-the-art methods in both recognition accuracy and implementation efficiency.

updated: Mon May 24 2021 14:46:40 GMT+0000 (UTC)

published: Mon May 24 2021 14:46:40 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト