PeopleSansPeople: A Synthetic Data Generator for Human-Centric Computer Vision

Salehe Erfanian Ebadi; You-Cyuan Jhang; Alex Zook; Saurav Dhakad; Adam Crespi; Pete Parisi; Steven Borkman; Jonathan Hogins; Sujoy Ganguly

PeopleSansPeople：人間中心のコンピュータービジョン用の合成データジェネレーター

近年、大規模なラベル付きデータセットの助けを借りて、人の検出と人間の姿勢の推定が大きく進歩しました。ただし、これらのデータセットには、人間の活動、ポーズ、またはコンテキストの多様性の保証や分析はありませんでした。さらに、プライバシー、法律、安全、および倫理上の懸念により、より多くの人間のデータを収集する能力が制限される場合があります。これらの問題のいくつかを軽減する実世界のデータの新たな代替手段は、合成データです。ただし、合成データジェネレータの作成は非常に困難であり、研究者がその有用性を探ることができません。そのため、シミュレーション対応の3D人間資産、パラメーター化された照明およびカメラシステムを含み、2Dおよび3Dバウンディングボックス、インスタンスおよびセマンティックセグメンテーション、およびCOCOポーズラベルを生成する、人間中心の合成データジェネレーターPeopleSansPeopleをリリースします。 PeopleSansPeopleを使用して、Detectron2KeypointR-CNNバリアントを使用してベンチマーク合成データトレーニングを実行しました[1]。合成データを使用してネットワークを事前トレーニングし、さまざまなサイズの実世界のデータを微調整すると、少数のショットの転送（COCOの限られたサブセット）でキーポイントAPが+38.03（44.43±0.17対6.40）増加することがわかりました。 -個人トレーニング[2]）、および豊富な実データレジームに対して+1.47（63.47±0.19対62.00）の増加であり、同じ実データのみでトレーニングされたモデルよりも優れています。また、私たちのモデルは、ImageNetで事前トレーニングされたモデルよりも優れており、キーポイントAPの増加は数ショット転送で+22.53（44.43±0.17対21.90）、豊富な実際のデータレジームで+1.07（63.47±0.19対62.40）でした。。この自由に利用できるデータジェネレーターは、人間中心のコンピュータービジョンの重要な領域での実際の転移学習へのシミュレーションの新しい分野への幅広い研究を可能にするはずです。

In recent years, person detection and human pose estimation have made great strides, helped by large-scale labeled datasets. However, these datasets had no guarantees or analysis of human activities, poses, or context diversity. Additionally, privacy, legal, safety, and ethical concerns may limit the ability to collect more human data. An emerging alternative to real-world data that alleviates some of these issues is synthetic data. However, creation of synthetic data generators is incredibly challenging and prevents researchers from exploring their usefulness. Therefore, we release a human-centric synthetic data generator PeopleSansPeople which contains simulation-ready 3D human assets, a parameterized lighting and camera system, and generates 2D and 3D bounding box, instance and semantic segmentation, and COCO pose labels. Using PeopleSansPeople, we performed benchmark synthetic data training using a Detectron2 Keypoint R-CNN variant [1]. We found that pre-training a network using synthetic data and fine-tuning on various sizes of real-world data resulted in a keypoint AP increase of +38.03 (44.43 ±0.17 vs. 6.40) for few-shot transfer (limited subsets of COCO-person train [2]), and an increase of +1.47 (63.47 ±0.19 vs. 62.00) for abundant real data regimes, outperforming models trained with the same real data alone. We also found that our models outperformed those pre-trained with ImageNet with a keypoint AP increase of +22.53 (44.43 ±0.17 vs. 21.90) for few-shot transfer and +1.07 (63.47 ±0.19 vs. 62.40) for abundant real data regimes. This freely-available data generator should enable a wide range of research into the emerging field of simulation to real transfer learning in the critical area of human-centric computer vision.

updated: Tue Jul 12 2022 01:30:11 GMT+0000 (UTC)

published: Fri Dec 17 2021 02:33:31 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト