Learning from Synthetic Human Group Activities

Che-Jui Chang; Honglu Zhou; Parth Goel; Aditya Bhat; Seonghyeon Moon; Samuel S. Sohn; Sejong Yoon; Vladimir Pavlovic; Mubbasir Kapadia

合成人間の集団活動から学ぶ

複雑な人間の相互作用やグループ活動の理解は、人間中心のコンピュータービジョンで注目を集めています。しかし、大規模なラベル付き実世界データセットを取得することが難しいため、関連タスクの進歩が妨げられています。この問題を軽減するために、我々は、マルチビュー、マルチグループ、マルチパーソンの人間の原子的行動およびグループ活動データジェネレーターである M3Act を提案します。 Unity エンジンを搭載した M3Act には、シミュレーション対応の 3D シーンと人的資産、設定可能な照明とカメラシステム、高度にパラメータ化されたモジュール式グループアクティビティ、およびデータ生成プロセス中の高度なドメインのランダム化が含まれています。当社のデータジェネレーターは、複数の視点、モダリティ (RGB 画像、2D ポーズ、3D モーション)、および個人および複数人のグループに対する高品質の注釈 (2D バウンディングボックス、インスタンスセグメンテーション) を備えた人間の活動の大規模なデータセットを生成できます。マスク、個人の行動、グループ活動のカテゴリー）。 M3Act を使用して、2D スケルトンベースのグループアクティビティ認識と RGB ベースの複数人のポーズ追跡のための合成データの事前トレーニングを実行します。結果は、合成データセットからの学習により、実世界のデータセットでのモデルのパフォーマンスが大幅に向上し、CAD2 でのグループと個人の認識精度がそれぞれ 5.59% と 7.32% という最高の向上を示し、同様に CAD2 での MOTP が 6.63 向上したことを示しています。こんにちはイブ。合成データを使用した事前トレーニングにより、下流タスクでのモデルの収束が高速化されます (最大 6.8% 高速化)。さらに、M3Act は 3D グループアクティビティ生成のための新しい研究課題を切り開きます。 M3Act3D は、以前の複数人のデータセットよりもグループサイズが大きく、人間間のインタラクションがより複雑になった人間の活動の 87.6 時間の 3D モーションデータセットをリリースします。私たちは複数の指標を定義し、新しいタスクの競争力のあるベースラインを提案します。

The understanding of complex human interactions and group activities has garnered attention in human-centric computer vision. However, the advancement of the related tasks is hindered due to the difficulty of obtaining large-scale labeled real-world datasets. To mitigate the issue, we propose M3Act, a multi-view multi-group multi-person human atomic action and group activity data generator. Powered by the Unity engine, M3Act contains simulation-ready 3D scenes and human assets, configurable lighting and camera systems, highly parameterized modular group activities, and a large degree of domain randomization during the data generation process. Our data generator is capable of generating large-scale datasets of human activities with multiple viewpoints, modalities (RGB images, 2D poses, 3D motions), and high-quality annotations for individual persons and multi-person groups (2D bounding boxes, instance segmentation masks, individual actions and group activity categories). Using M3Act, we perform synthetic data pre-training for 2D skeleton-based group activity recognition and RGB-based multi-person pose tracking. The results indicate that learning from our synthetic datasets largely improves the model performances on real-world datasets, with the highest gain of 5.59% and 7.32% respectively in group and person recognition accuracy on CAD2, as well as an improvement of 6.63 in MOTP on HiEve. Pre-training with our synthetic data also leads to faster model convergence on downstream tasks (up to 6.8% faster). Moreover, M3Act opens new research problems for 3D group activity generation. We release M3Act3D, an 87.6-hour 3D motion dataset of human activities with larger group sizes and higher complexity of inter-person interactions than previous multi-person datasets. We define multiple metrics and propose a competitive baseline for the novel task.

updated: Sun Jul 16 2023 05:44:57 GMT+0000 (UTC)

published: Thu Jun 29 2023 08:13:57 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト