Body2Hands: Learning to Infer 3D Hands from Conversational Gesture Body Dynamics

Evonne Ng; Shiry Ginosar; Trevor Darrell; Hanbyul Joo

Body2Hands：会話型ジェスチャーのボディダイナミクスから3Dハンドを推測する学習

会話型ジェスチャーの領域での3D手の形状の合成と推定のために、体の動きの新しい学習済みの事前事前学習を提案します。私たちのモデルは、身体の動きと手のジェスチャーが非言語コミュニケーションの設定で強く相関しているという洞察に基づいています。この事前の学習を、体の動きの入力のみが与えられた場合の3D手の形状の予測タスクとして定式化します。インターネットビデオの大規模なデータセットから取得した3Dポーズ推定で訓練された私たちの手予測モデルは、入力として話者の腕の3Dモーションのみが与えられた場合に、説得力のある3D手ジェスチャーを生成します。体の動きの入力からのハンドジェスチャーの合成、および単一ビューの画像ベースの3D手のポーズ推定の前の強力なボディとして、このメソッドの有効性を示します。私たちの方法は、これまでの最先端のアプローチよりも優れており、モノローグベースのトレーニングデータを超えて、多人数の会話に一般化できることを示しています。ビデオの結果は、http：//people.eecs.berkeley.edu/~evonne_ng/projects/body2hands/で入手できます。

We propose a novel learned deep prior of body motion for 3D hand shape synthesis and estimation in the domain of conversational gestures. Our model builds upon the insight that body motion and hand gestures are strongly correlated in non-verbal communication settings. We formulate the learning of this prior as a prediction task of 3D hand shape over time given body motion input alone. Trained with 3D pose estimations obtained from a large-scale dataset of internet videos, our hand prediction model produces convincing 3D hand gestures given only the 3D motion of the speaker's arms as input. We demonstrate the efficacy of our method on hand gesture synthesis from body motion input, and as a strong body prior for single-view image-based 3D hand pose estimation. We demonstrate that our method outperforms previous state-of-the-art approaches and can generalize beyond the monologue-based training data to multi-person conversations. Video results are available at http://people.eecs.berkeley.edu/~evonne_ng/projects/body2hands/.

updated: Sun Apr 04 2021 23:13:34 GMT+0000 (UTC)

published: Thu Jul 23 2020 22:58:15 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト