CHAIRS: Towards Full-Body Articulated Human-Object Interaction

Nan Jiang; Tengyu Liu; Zhexuan Cao; Jieming Cui; Yixin Chen; He Wang; Yixin Zhu; Siyuan Huang

CHAIRS: 全身関節型の人間と物体の相互作用に向けて

3D HOI のきめの細かいキャプチャは、人間の活動の理解を促進し、アクションの認識、全体的なシーンの再構築、人間の動作の合成など、下流の視覚タスクを容易にします。その重要性にもかかわらず、既存の研究はほとんどの場合、人間が身体のわずかな部分のみを使用して剛体と対話することを前提としており、その範囲が制限されています。この論文では、f-AHOI の挑戦的な問題に取り組みます。この問題では、人体全体が多関節オブジェクトと相互作用し、その部分は可動ジョイントによって接続されます。大規模なモーションキャプチャされた f-AHOI データセットである CHAIRS を提示します。これは、46 人の参加者と 81 の関節式で剛性のある座れるオブジェクトとの間の 16.2 時間の多用途の相互作用で構成されています。 CHAIRS は、インタラクティブなプロセス全体で人間と多関節オブジェクトの両方の 3D メッシュを提供するだけでなく、現実的で物理的にもっともらしい全身の相互作用を提供します。オブジェクトの姿勢推定で CHAIRS の価値を示します。 HOI で幾何学的関係を学習することにより、人体の姿勢推定を活用して、全身相互作用中の関節のあるオブジェクトの姿勢と形状の推定に取り組む最初のモデルを考案しました。画像と推定された人間のポーズが与えられると、私たちのモデルはまずオブジェクトのポーズと形状を再構築し、次に学習された相互作用の事前に応じて再構築を最適化します。両方の評価設定 (たとえば、オブジェクトの形状/構造の知識がある場合とない場合) の下で、モデルはベースラインよりも大幅に優れています。 CHAIRS がコミュニティをよりきめ細かい相互作用の理解に向けて促進することを願っています。データ/コードを公開します。

Fine-grained capturing of 3D HOI boosts human activity understanding and facilitates downstream visual tasks, including action recognition, holistic scene reconstruction, and human motion synthesis. Despite its significance, existing works mostly assume that humans interact with rigid objects using only a few body parts, limiting their scope. In this paper, we address the challenging problem of f-AHOI, wherein the whole human bodies interact with articulated objects, whose parts are connected by movable joints. We present CHAIRS, a large-scale motion-captured f-AHOI dataset, consisting of 16.2 hours of versatile interactions between 46 participants and 81 articulated and rigid sittable objects. CHAIRS provides 3D meshes of both humans and articulated objects during the entire interactive process, as well as realistic and physically plausible full-body interactions. We show the value of CHAIRS with object pose estimation. By learning the geometrical relationships in HOI, we devise the very first model that leverage human pose estimation to tackle the estimation of articulated object poses and shapes during whole-body interactions. Given an image and an estimated human pose, our model first reconstructs the pose and shape of the object, then optimizes the reconstruction according to a learned interaction prior. Under both evaluation settings (e.g., with or without the knowledge of objects' geometries/structures), our model significantly outperforms baselines. We hope CHAIRS will promote the community towards finer-grained interaction understanding. We will make the data/code publicly available.

updated: Tue Dec 20 2022 19:50:54 GMT+0000 (UTC)

published: Tue Dec 20 2022 19:50:54 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト