MUG: Multi-human Graph Network for 3D Mesh Reconstruction from 2D Pose

Chenyan Wu; Yandong Li; Xianfeng Tang; James Wang

MUG：2Dポーズからの3Dメッシュ再構成のためのマルチヒューマングラフネットワーク

単一の単眼画像から複数の人体メッシュを再構築することは重要ですが、コンピュータビジョンの問題です。個々のボディメッシュモデルに加えて、コヒーレントな表現を生成するために、被験者間の相対的な3D位置を推定する必要があります。この作業では、MUG（Multi-hUman Graph network）という名前の単一グラフニューラルネットワークを介して、入力としてマルチヒューマン2Dポーズのみを使用してコヒーレントマルチヒューマンメッシュを構築します。検出スタイルのパイプラインを採用する既存の方法と比較すると（つまり、画像の特徴を抽出し、人間のインスタンスを特定し、そこからボディメッシュを復元する）、ラボで収集されたトレーニングデータセットと実際のテストとの間に大きなドメインギャップがあります。データセット、私たちの方法は、データセット全体で比較的一貫した幾何学的特性を持つ2Dポーズの恩恵を受けています。私たちの方法は次のように機能します。まず、複数の人間の環境をモデル化するために、複数の人間の2Dポーズを処理し、新しい異種グラフを作成します。このグラフでは、さまざまな人のノードと1人の人のノードが接続され、人間間の相互作用をキャプチャして描画します。ボディジオメトリ（つまり、スケルトンとメッシュ構造）。次に、デュアルブランチグラフニューラルネットワーク構造を採用しています。1つは人間間の深度関係を予測するためのもので、もう1つはルートジョイント相対メッシュ座標を予測するためのものです。最後に、マルチヒューマン3Dメッシュ全体は、両方のブランチからの出力を組み合わせることによって構築されます。広範な実験により、MUGは、標準の3D人間ベンチマーク（Panoptic、MuPoTS-3D、および3DPW）で以前のマルチ人間メッシュ推定方法よりも優れていることが示されています。

Reconstructing multi-human body mesh from a single monocular image is an important but challenging computer vision problem. In addition to the individual body mesh models, we need to estimate relative 3D positions among subjects to generate a coherent representation. In this work, through a single graph neural network, named MUG (Multi-hUman Graph network), we construct coherent multi-human meshes using only multi-human 2D pose as input. Compared with existing methods, which adopt a detection-style pipeline (i.e., extracting image features and then locating human instances and recovering body meshes from that) and suffer from the significant domain gap between lab-collected training datasets and in-the-wild testing datasets, our method benefits from the 2D pose which has a relatively consistent geometric property across datasets. Our method works like the following: First, to model the multi-human environment, it processes multi-human 2D poses and builds a novel heterogeneous graph, where nodes from different people and within one person are connected to capture inter-human interactions and draw the body geometry (i.e., skeleton and mesh structure). Second, it employs a dual-branch graph neural network structure -- one for predicting inter-human depth relation and the other one for predicting root-joint-relative mesh coordinates. Finally, the entire multi-human 3D meshes are constructed by combining the output from both branches. Extensive experiments demonstrate that MUG outperforms previous multi-human mesh estimation methods on standard 3D human benchmarks -- Panoptic, MuPoTS-3D and 3DPW.

updated: Fri Jul 21 2023 18:41:39 GMT+0000 (UTC)

published: Wed May 25 2022 08:54:52 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト