DECA: Deep viewpoint-Equivariant human pose estimation using Capsule Autoencoders

Nicola Garau; Niccolò Bisagno; Piotr Bródka; Nicola Conci

DECA：深い視点-カプセルオートエンコーダを使用した同変人間姿勢推定

人間の姿勢推定（HPE）は、画像またはビデオから人間の関節の3D位置を取得することを目的としています。現在の3DHPEメソッドは、視点の同変性に欠けていることを示しています。つまり、トレーニング時に見えない視点を処理すると、失敗したり、パフォーマンスが低下したりする傾向があります。深層学習の方法は、多くの場合、最大プーリングなど、スケール不変、平行移動不変、または回転不変の操作のいずれかに依存します。ただし、このような手順を採用しても、必ずしも視点の一般化が改善されるわけではなく、データに依存する方法が増えます。この問題に取り組むために、DECAという名前の高速変分ベイズカプセルルーティングを備えた新しいカプセルオートエンコーダネットワークを提案します。ルーティングアルゴリズムと組み合わせて、各ジョイントをカプセルエンティティとしてモデル化することにより、私たちのアプローチは、視点から独立して、フィーチャスペース内のジョイントの階層的および幾何学的構造を保持できます。視点の同変を実現することで、トレーニング時のネットワークデータの依存性を大幅に低減し、見えない視点を一般化する能力を向上させます。実験的検証では、上面図と正面図の両方で、見えている視点と見えていない視点の両方からの深度画像で他の方法よりも優れています。 RGBドメインでは、同じネットワークが挑戦的な視点転送タスクで最先端の結果を提供し、トップビューHPEの新しいフレームワークも確立します。コードはhttps://github.com/mmlab-cv/DECAにあります。

Human Pose Estimation (HPE) aims at retrieving the 3D position of human joints from images or videos. We show that current 3D HPE methods suffer a lack of viewpoint equivariance, namely they tend to fail or perform poorly when dealing with viewpoints unseen at training time. Deep learning methods often rely on either scale-invariant, translation-invariant, or rotation-invariant operations, such as max-pooling. However, the adoption of such procedures does not necessarily improve viewpoint generalization, rather leading to more data-dependent methods. To tackle this issue, we propose a novel capsule autoencoder network with fast Variational Bayes capsule routing, named DECA. By modeling each joint as a capsule entity, combined with the routing algorithm, our approach can preserve the joints' hierarchical and geometrical structure in the feature space, independently from the viewpoint. By achieving viewpoint equivariance, we drastically reduce the network data dependency at training time, resulting in an improved ability to generalize for unseen viewpoints. In the experimental validation, we outperform other methods on depth images from both seen and unseen viewpoints, both top-view, and front-view. In the RGB domain, the same network gives state-of-the-art results on the challenging viewpoint transfer task, also establishing a new framework for top-view HPE. The code can be found at https://github.com/mmlab-cv/DECA.

updated: Thu Aug 19 2021 08:46:15 GMT+0000 (UTC)

published: Thu Aug 19 2021 08:46:15 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト