Self-Supervised 3D Human Pose Estimation in Static Video Via Neural Rendering

Luca Schmidtke; Benjamin Hou; Athanasios Vlontzos; Bernhard Kainz

ニューラルレンダリングによる静的ビデオでの自己管理型 3D 人間姿勢推定

2D 画像から 3D の人間のポーズを推測することは、モーションキャプチャ、仮想現実、監視、スポーツや医療の歩行分析など、多くのアプリケーションを含むコンピュータービジョンの分野において、困難で長年にわたる問題です。手動のランドマーク注釈を必要とせずに、1 人の人物と静的な背景を含む 2D ビデオから 3D ポーズを推定する方法の予備結果を提示します。シンプルで効果的な自己監視タスクを定式化することでこれを達成します。モデルは、別の時点からのフレームと、変換された人間の形状テンプレートのレンダリングされた画像を指定して、ビデオのランダムなフレームを再構築する必要があります。最適化のために重要なのは、レイキャスティングベースのレンダリングパイプラインが完全に微分可能であり、再構築タスクのみに基づいてエンドツーエンドのトレーニングを可能にすることです。

Inferring 3D human pose from 2D images is a challenging and long-standing problem in the field of computer vision with many applications including motion capture, virtual reality, surveillance or gait analysis for sports and medicine. We present preliminary results for a method to estimate 3D pose from 2D video containing a single person and a static background without the need for any manual landmark annotations. We achieve this by formulating a simple yet effective self-supervision task: our model is required to reconstruct a random frame of a video given a frame from another timepoint and a rendered image of a transformed human shape template. Crucially for optimisation, our ray casting based rendering pipeline is fully differentiable, enabling end to end training solely based on the reconstruction task.

updated: Mon Oct 10 2022 09:24:07 GMT+0000 (UTC)

published: Mon Oct 10 2022 09:24:07 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト