3DInAction: Understanding Human Actions in 3D Point Clouds

Yizhak Ben-Shabat; Oren Shrout; Stephen Gould

3DInAction: 3D 点群で人間の行動を理解する

3D 点群アクション認識の新しい方法を提案します。 RGB ビデオでの人間の行動を理解することは、近年広く研究されていますが、3D ポイントクラウドの対応物はまだ調査されていません。これは主に、点群データモダリティの固有の制限 (構造の欠如、順列の不変性、点の数の変化) によるものであり、時空間表現の学習を困難にしています。この制限に対処するために、有益な時空間表現を学習する階層アーキテクチャとともに、時間内に移動するパッチ (t パッチ) を主要なビルディングブロックとして最初に推定する 3DinAction パイプラインを提案します。 DFAUST や IKEA ASM などの既存のデータセットで、この方法によりパフォーマンスが向上することを示します。

We propose a novel method for 3D point cloud action recognition. Understanding human actions in RGB videos has been widely studied in recent years, however, its 3D point cloud counterpart remains under-explored. This is mostly due to the inherent limitation of the point cloud data modality -- lack of structure, permutation invariance, and varying number of points -- which makes it difficult to learn a spatio-temporal representation. To address this limitation, we propose the 3DinAction pipeline that first estimates patches moving in time (t-patches) as a key building block, alongside a hierarchical architecture that learns an informative spatio-temporal representation. We show that our method achieves improved performance on existing datasets, including DFAUST and IKEA ASM.

updated: Sat Mar 11 2023 08:42:54 GMT+0000 (UTC)

published: Sat Mar 11 2023 08:42:54 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト