Self-Attentive 3D Human Pose and Shape Estimation from Videos

Yun-Chun Chen; Marco Piccirilli; Robinson Piramuthu; Ming-Hsuan Yang

ビデオからの自己注意の3D人間のポーズと形状の推定

ビデオから人間の3Dポーズと形状を推定するタスクを検討します。既存のフレームベースのアプローチは大きな進歩を遂げましたが、これらの方法は各画像に個別に適用されるため、予測に一貫性がなくなることがよくあります。この作業では、3D人間のポーズと形状の推定のためのビデオベースの学習アルゴリズムを提示します。私たちの方法の重要な洞察は2つあります。まず、一貫性のない時間予測の問題に対処するために、ビデオ内の時間情報を活用し、フレーム間の短距離と長距離の依存関係を共同で考慮して、時間的に一貫した推定を行う自己注意モジュールを提案します。次に、隣接するフレーム間の遷移をスムーズにする予測モジュールを使用して、人間の動きをモデル化します。 3DPW、MPI-INF-3DHP、およびHuman3.6Mデータセットでメソッドを評価します。広範な実験結果は、私たちのアルゴリズムが最先端の方法に対して有利に機能することを示しています。

We consider the task of estimating 3D human pose and shape from videos. While existing frame-based approaches have made significant progress, these methods are independently applied to each image, thereby often leading to inconsistent predictions. In this work, we present a video-based learning algorithm for 3D human pose and shape estimation. The key insights of our method are two-fold. First, to address the inconsistent temporal prediction issue, we exploit temporal information in videos and propose a self-attention module that jointly considers short-range and long-range dependencies across frames, resulting in temporally coherent estimations. Second, we model human motion with a forecasting module that allows the transition between adjacent frames to be smooth. We evaluate our method on the 3DPW, MPI-INF-3DHP, and Human3.6M datasets. Extensive experimental results show that our algorithm performs favorably against the state-of-the-art methods.

updated: Tue Sep 07 2021 01:28:20 GMT+0000 (UTC)

published: Fri Mar 26 2021 00:02:19 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト