Cascaded deep monocular 3D human pose estimation with evolutionary training data

Shichao Li; Lei Ke; Kevin Pratama; Yu-Wing Tai; Chi-Keung Tang; Kwang-Ting Cheng

進化的トレーニングデータを使用したカスケードディープ単眼3D人間ポーズ推定

エンドツーエンドの深層表現学習は、単眼3D人間の姿勢推定で驚くべき精度を達成しましたが、これらのモデルは、限られた固定のトレーニングデータで見えないポーズでは失敗する可能性があります。この論文では、（1）2Dから3Dへのネットワークをトレーニングするための大量のトレーニングデータ（800万を超える有効な3D人間のポーズと対応する2D投影）を合成するためにスケーラブルであり、（2）効果的に削減できる新しいデータ拡張方法を提案します。データセットバイアス。私たちの方法は、限られたデータセットを進化させて、階層的な人間の表現と事前の知識に触発されたヒューリスティックに基づいて、目に見えない3D人間の骨格を合成します。広範な実験により、私たちのアプローチは、最大の公開ベンチマークで最先端の精度を達成するだけでなく、目に見えないまれなポーズに対しても大幅に一般化することが示されています。コード、事前トレーニング済みのモデル、およびツールは、このHTTPSURLで入手できます。

End-to-end deep representation learning has achieved remarkable accuracy for monocular 3D human pose estimation, yet these models may fail for unseen poses with limited and fixed training data. This paper proposes a novel data augmentation method that: (1) is scalable for synthesizing massive amount of training data (over 8 million valid 3D human poses with corresponding 2D projections) for training 2D-to-3D networks, (2) can effectively reduce dataset bias. Our method evolves a limited dataset to synthesize unseen 3D human skeletons based on a hierarchical human representation and heuristics inspired by prior knowledge. Extensive experiments show that our approach not only achieves state-of-the-art accuracy on the largest public benchmark, but also generalizes significantly better to unseen and rare poses. Code, pre-trained models and tools are available at this HTTPS URL.

updated: Thu Apr 08 2021 08:08:15 GMT+0000 (UTC)

published: Sun Jun 14 2020 03:09:52 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト