Generative Proxemics: A Prior for 3D Social Interaction from Images

Lea Müller; Vickie Ye; Georgios Pavlakos; Michael Black; Angjoo Kanazawa

ジェネレーティブプロクセミックス: 画像からの 3D ソーシャルインタラクションの事前検証

社会的相互作用は、人間の行動とコミュニケーションの基本的な側面です。個人が他人との関係で自分自身をどのように位置づけるかは、プロクセミクスとしても知られ、社会的な合図を伝え、社会的相互作用のダイナミクスに影響を与えます。我々は、密接な社会的相互作用を行う二人の前に 3D プロクセミックスを学習する新しいアプローチを紹介します。交流する人々の大規模な 3D データセットを収集するのは困難であるため、社会的な交流が豊富な 2D 画像コレクションに依存しています。これは、既存のグラウンドトゥルースコンタクトマップを使用した最適化アプローチにより、画像から対話する人々の擬似グラウンドトゥルース 3D メッシュを再構築することで実現します。次に、BUDDI と呼ばれる新しいノイズ除去拡散モデルを使用してプロクセミックスをモデル化します。BUDDI は、SMPL-X パラメーター空間で直接、密接な社会的相互作用にある 2 人の人物の共同分布を学習します。生成プロクセミクスモデルからのサンプリングにより、リアルな 3D 人間のインタラクションが生成され、これはユーザー調査を通じて検証されます。さらに、接触アノテーションなしで単一の画像から近接する 2 人の人物を再構成する前に拡散を使用する新しい最適化方法を導入します。私たちのアプローチは、ノイズの多い初期推定値からより正確でもっともらしい 3D ソーシャルインタラクションを復元し、最先端の手法を上回るパフォーマンスを発揮します。コード、データ、モデルについては、プロジェクトサイト muelea.github.io/buddi を参照してください。

Social interaction is a fundamental aspect of human behavior and communication. The way individuals position themselves in relation to others, also known as proxemics, conveys social cues and affects the dynamics of social interaction. We present a novel approach that learns a 3D proxemics prior of two people in close social interaction. Since collecting a large 3D dataset of interacting people is a challenge, we rely on 2D image collections where social interactions are abundant. We achieve this by reconstructing pseudo-ground truth 3D meshes of interacting people from images with an optimization approach using existing ground-truth contact maps. We then model the proxemics using a novel denoising diffusion model called BUDDI that learns the joint distribution of two people in close social interaction directly in the SMPL-X parameter space. Sampling from our generative proxemics model produces realistic 3D human interactions, which we validate through a user study. Additionally, we introduce a new optimization method that uses the diffusion prior to reconstruct two people in close proximity from a single image without any contact annotation. Our approach recovers more accurate and plausible 3D social interactions from noisy initial estimates and outperforms state-of-the-art methods. See our project site for code, data, and model: muelea.github.io/buddi.

updated: Thu Jun 15 2023 17:59:20 GMT+0000 (UTC)

published: Thu Jun 15 2023 17:59:20 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト