Direct Multi-view Multi-person 3D Pose Estimation

Tao Wang; Jianfeng Zhang; Yujun Cai; Shuicheng Yan; Jiashi Feng

直接マルチビューマルチパーソン3Dポーズ推定

マルチビュー画像から複数人の3Dポーズを推定するためのマルチビューポーズトランスフォーマー（MvP）を紹介します。以前の方法のように、コストのかかる体積表現から3D関節の位置を推定したり、複数の検出された2Dポーズから1人あたりの3Dポーズを再構築したりする代わりに、MvPは、中間タスクに依存することなく、クリーンで効率的な方法で複数人の3Dポーズを直接回帰します。具体的には、MvPはスケルトンジョイントを学習可能なクエリ埋め込みとして表し、入力画像からのマルチビュー情報に徐々に注意を向けて推論し、実際の3Dジョイントの位置を直接回帰させます。このような単純なパイプラインの精度を向上させるために、MvPは、複数人の骨格関節のクエリ埋め込みを簡潔に表す階層スキームを提示し、入力に依存するクエリ適応アプローチを導入します。さらに、MvPは、各関節のクロスビュー情報をより正確に融合するために、射影的注意と呼ばれる新しい幾何学的にガイドされた注意メカニズムを設計します。 MvPはまた、RayConv操作を導入して、ビューに依存するカメラジオメトリをフィーチャ表現に統合し、投影的注意を強化します。私たちのMvPモデルは、はるかに効率的でありながら、いくつかのベンチマークで最先端の方法よりも優れていることを実験的に示しています。特に、挑戦的なPanopticデータセットで92.3％AP25を達成し、以前の最良のアプローチ[36]を9.8％改善しました。 MvPは一般的であり、SMPLモデルで表される人間のメッシュを復元することもできるため、複数の人の体型をモデル化するのに役立ちます。コードとモデルはhttps://github.com/sail-sg/mvpで入手できます。

We present Multi-view Pose transformer (MvP) for estimating multi-person 3D poses from multi-view images. Instead of estimating 3D joint locations from costly volumetric representation or reconstructing the per-person 3D pose from multiple detected 2D poses as in previous methods, MvP directly regresses the multi-person 3D poses in a clean and efficient way, without relying on intermediate tasks. Specifically, MvP represents skeleton joints as learnable query embeddings and let them progressively attend to and reason over the multi-view information from the input images to directly regress the actual 3D joint locations. To improve the accuracy of such a simple pipeline, MvP presents a hierarchical scheme to concisely represent query embeddings of multi-person skeleton joints and introduces an input-dependent query adaptation approach. Further, MvP designs a novel geometrically guided attention mechanism, called projective attention, to more precisely fuse the cross-view information for each joint. MvP also introduces a RayConv operation to integrate the view-dependent camera geometry into the feature representations for augmenting the projective attention. We show experimentally that our MvP model outperforms the state-of-the-art methods on several benchmarks while being much more efficient. Notably, it achieves 92.3% AP25 on the challenging Panoptic dataset, improving upon the previous best approach [36] by 9.8%. MvP is general and also extendable to recovering human mesh represented by the SMPL model, thus useful for modeling multi-person body shapes. Code and models are available at https://github.com/sail-sg/mvp.

updated: Sat Nov 27 2021 05:31:24 GMT+0000 (UTC)

published: Sun Nov 07 2021 13:09:20 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト