Conditional Directed Graph Convolution for 3D Human Pose Estimation

Wenbo Hu; Changgong Zhang; Fangneng Zhan; Lei Zhang; Tien-Tsin Wong

3D人間の姿勢推定のための条件付き有向グラフ畳み込み

グラフ畳み込みネットワークは、人間の骨格を無向グラフとして表すことにより、3D人間の姿勢の推定を大幅に改善しました。ただし、この表現は、関節間の階層順序が明示的に提示されていないため、人間の骨格の明確な特性を反映していません。この論文では、人間の骨格を有向グラフとして表現し、関節をノードとして、骨をエッジとして親関節から子関節に向けることを提案します。そうすることで、エッジの方向はノード間の階層関係を明示的に反映できます。この表現に基づいて、入力ポーズでグラフトポロジを調整することにより、さまざまなポーズのさまざまな非局所依存性を活用するための時空間条件付き有向グラフ畳み込みをさらに提案します。全体として、単眼ビデオからの3D人間の姿勢推定のために、U字型条件付き有向グラフ畳み込みネットワークという名前のU字型ネットワークを形成します。私たちの方法の有効性を評価するために、Human3.6MとMPI-INF-3DHPという2つの挑戦的な大規模ベンチマークで広範な実験を行いました。定量的および定性的な結果の両方が、私たちの方法が最高のパフォーマンスを達成していることを示しています。また、アブレーション研究は、有向グラフが無向グラフよりも関節のある人間の骨格の階層をうまく活用できること、および条件付き接続がさまざまなポーズの適応グラフトポロジを生成できることを示しています。

Graph convolutional networks have significantly improved 3D human pose estimation by representing the human skeleton as an undirected graph. However, this representation fails to reflect the articulated characteristic of human skeletons as the hierarchical orders among the joints are not explicitly presented. In this paper, we propose to represent the human skeleton as a directed graph with the joints as nodes and bones as edges that are directed from parent joints to child joints. By so doing, the directions of edges can explicitly reflect the hierarchical relationships among the nodes. Based on this representation, we further propose a spatial-temporal conditional directed graph convolution to leverage varying non-local dependence for different poses by conditioning the graph topology on input poses. Altogether, we form a U-shaped network, named U-shaped Conditional Directed Graph Convolutional Network, for 3D human pose estimation from monocular videos. To evaluate the effectiveness of our method, we conducted extensive experiments on two challenging large-scale benchmarks: Human3.6M and MPI-INF-3DHP. Both quantitative and qualitative results show that our method achieves top performance. Also, ablation studies show that directed graphs can better exploit the hierarchy of articulated human skeletons than undirected graphs, and the conditional connections can yield adaptive graph topologies for different poses.

updated: Wed Aug 04 2021 09:06:49 GMT+0000 (UTC)

published: Fri Jul 16 2021 09:50:40 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト