Improving Robustness and Accuracy via Relative Information Encoding in 3D Human Pose Estimation

Wenkang Shan; Haopeng Lu; Shanshe Wang; Xinfeng Zhang; Wen Gao

3D人間の姿勢推定における相対情報エンコーディングによるロバスト性と精度の改善

既存の3D人間の姿勢推定アプローチのほとんどは、人体の全体的な軌道（グローバルモーション）ではなく、主にルートジョイントと他の人間の関節の間の3D位置関係（ローカルモーション）の予測に焦点を合わせています。これらのアプローチによって達成された大きな進歩にもかかわらず、それらはグローバルな動きに対してロバストではなく、小さな動きの範囲でローカルな動きを正確に予測する能力を欠いています。これらの2つの問題を軽減するために、位置的および時間的に強化された表現を生成する相対情報エンコーディング方法を提案します。まず、2Dポーズの相対座標を利用して位置情報をエンコードし、入力と出力の分布の一貫性を高めます。絶対2D位置が異なる同じ姿勢を、共通の表現にマッピングできます。予測結果に対するグローバルな動きの干渉に抵抗することは有益です。次に、ある期間内に現在のポーズと同じ人物の他のポーズとの間の接続を確立することにより、時間情報をエンコードします。現在のポーズ前後の動きの変化に注意が払われるため、動きの範囲が狭い局所的な動きの予測性能が向上します。アブレーション研究は、提案された相対情報符号化法の有効性を検証します。さらに、フレームワーク全体に多段階の最適化手法を導入して、位置的および時間的に強化された表現をさらに活用します。私たちの方法は、2つの公開データセットで最先端の方法よりも優れています。コードはhttps://github.com/paTRICK-swk/Pose3D-RIEで入手できます。

Most of the existing 3D human pose estimation approaches mainly focus on predicting 3D positional relationships between the root joint and other human joints (local motion) instead of the overall trajectory of the human body (global motion). Despite the great progress achieved by these approaches, they are not robust to global motion, and lack the ability to accurately predict local motion with a small movement range. To alleviate these two problems, we propose a relative information encoding method that yields positional and temporal enhanced representations. Firstly, we encode positional information by utilizing relative coordinates of 2D poses to enhance the consistency between the input and output distribution. The same posture with different absolute 2D positions can be mapped to a common representation. It is beneficial to resist the interference of global motion on the prediction results. Second, we encode temporal information by establishing the connection between the current pose and other poses of the same person within a period of time. More attention will be paid to the movement changes before and after the current pose, resulting in better prediction performance on local motion with a small movement range. The ablation studies validate the effectiveness of the proposed relative information encoding method. Besides, we introduce a multi-stage optimization method to the whole framework to further exploit the positional and temporal enhanced representations. Our method outperforms state-of-the-art methods on two public datasets. Code is available at https://github.com/paTRICK-swk/Pose3D-RIE.

updated: Thu Jul 29 2021 14:12:19 GMT+0000 (UTC)

published: Thu Jul 29 2021 14:12:19 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト