Efficient Human Pose Estimation via 3D Event Point Cloud

Jiaan Chen; Hao Shi; Yaozu Ye; Kailun Yang; Lei Sun; Kaiwei Wang

3Dイベントポイントクラウドによる効率的な人間のポーズ推定

RGB画像に基づく人間のポーズ推定（HPE）は、深層学習の恩恵を受けて急速な発展を遂げてきました。ただし、イベントベースのHPEは十分に研究されていないため、極端なシーンや効率が重要な条件でのアプリケーションに大きな可能性を秘めています。この論文では、3Dイベントポイントクラウドから直接2D人間のポーズを推定する最初の人です。イベントの新しい表現であるラスタライズされたイベントポイントクラウドを提案し、小さなタイムスライスの同じ位置にイベントを集約します。複数の統計的手がかりから3D機能を維持し、メモリ消費と計算の複雑さを大幅に削減し、私たちの作業で効率的であることが証明されました。次に、ラスタライズされたイベントポイントクラウドを3つの異なるバックボーン、PointNet、DGCNN、およびPoint Transformerへの入力として活用し、2つの線形レイヤーデコーダーを使用して人間のキーポイントの位置を予測します。私たちの方法に基づくと、PointNetははるかに高速で有望な結果を達成しますが、Point Transfomerは、以前のイベントフレームベースの方法に近い場合でも、はるかに高い精度に到達します。包括的な一連の結果は、提案された方法が、イベント駆動型の人間のポーズ推定におけるこれらの3Dバックボーンモデルに一貫して有効であることを示しています。 2048ポイントの入力を使用するPointNetに基づく方法では、DHP19データセットのMPJPE3Dで82.46mmを達成しますが、NVIDIA Jetson Xavier NXエッジコンピューティングプラットフォームでは12.29msの遅延しかなく、イベントカメラでのリアルタイム検出に最適です。コードはhttps://github.com/MasterHow/EventPointPoseで公開されます。

Human Pose Estimation (HPE) based on RGB images has experienced a rapid development benefiting from deep learning. However, event-based HPE has not been fully studied, which remains great potential for applications in extreme scenes and efficiency-critical conditions. In this paper, we are the first to estimate 2D human pose directly from 3D event point cloud. We propose a novel representation of events, the rasterized event point cloud, aggregating events on the same position of a small time slice. It maintains the 3D features from multiple statistical cues and significantly reduces memory consumption and computation complexity, proved to be efficient in our work. We then leverage the rasterized event point cloud as input to three different backbones, PointNet, DGCNN, and Point Transformer, with two linear layer decoders to predict the location of human keypoints. We find that based on our method, PointNet achieves promising results with much faster speed, whereas Point Transfomer reaches much higher accuracy, even close to previous event-frame-based methods. A comprehensive set of results demonstrates that our proposed method is consistently effective for these 3D backbone models in event-driven human pose estimation. Our method based on PointNet with 2048 points input achieves 82.46mm in MPJPE3D on the DHP19 dataset, while only has a latency of 12.29ms on an NVIDIA Jetson Xavier NX edge computing platform, which is ideally suitable for real-time detection with event cameras. Code will be made publicly at https://github.com/MasterHow/EventPointPose.

updated: Thu Jun 09 2022 13:50:20 GMT+0000 (UTC)

published: Thu Jun 09 2022 13:50:20 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト