Multi-view Human Pose and Shape Estimation Using Learnable Volumetric Aggregation

Soyong Shin; Eni Halilaj

学習可能な体積集計を使用したマルチビューの人間の姿勢と形状の推定

RGB画像からの人間のポーズと形状の推定は、マーカーベースのモーションキャプチャの代替手段として非常に求められています。これは、手間がかかり、高価な機器を必要とし、キャプチャを実験室環境に制限します。ただし、単眼視覚ベースのアルゴリズムは、依然として回転のあいまいさに悩まされており、高精度が最優先されるヘルスケアアプリケーションでの翻訳の準備ができていません。複数の視点からのデータの融合はこれらの課題を克服する可能性がありますが、現在のアルゴリズムでは、臨床的に許容できる精度を得るためにさらなる改善が必要です。この論文では、キャリブレーションされたマルチビュー画像から3D人体のポーズと形状を再構築するための学習可能な体積集計アプローチを提案します。人体のパラメトリック表現を使用しているため、このアプローチは医療アプリケーションに直接適用できます。以前のアプローチと比較して、私たちのフレームワークは、そのコスト効率を考えると、より高い精度とリアルタイム予測の大きな可能性を示しています。

Human pose and shape estimation from RGB images is a highly sought after alternative to marker-based motion capture, which is laborious, requires expensive equipment, and constrains capture to laboratory environments. Monocular vision-based algorithms, however, still suffer from rotational ambiguities and are not ready for translation in healthcare applications, where high accuracy is paramount. While fusion of data from multiple viewpoints could overcome these challenges, current algorithms require further improvement to obtain clinically acceptable accuracies. In this paper, we propose a learnable volumetric aggregation approach to reconstruct 3D human body pose and shape from calibrated multi-view images. We use a parametric representation of the human body, which makes our approach directly applicable to medical applications. Compared to previous approaches, our framework shows higher accuracy and greater promise for real-time prediction, given its cost efficiency.

updated: Thu Nov 26 2020 18:33:35 GMT+0000 (UTC)

published: Thu Nov 26 2020 18:33:35 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト