Task-Oriented Human-Object Interactions Generation with Implicit Neural Representations

Quanzhou Li; Jingbo Wang; Chen Change Loy; Bo Dai

暗黙のニューラル表現によるタスク指向の人間とオブジェクトの相互作用の生成

デジタルヒューマンモーションシンセシスは、映画、AR/VR、ビデオゲームなどに応用されている活発な研究分野です。自然でリアルな人間の動きを生成する方法が提案されていましたが、ほとんどの方法は人間のモデリングにのみ焦点を当てており、オブジェクトの動きをほとんど無視しています。シミュレーションでタスク指向の人間とオブジェクトの相互作用モーションを生成することは困難です。オブジェクトを使用するさまざまな意図のために、人間はさまざまな動作を行います。そのためには、まず人間がオブジェクトに近づき、静止するのではなく、人間と同じようにオブジェクトを動かす必要があります。また、下流のアプリケーションに展開するには、合成されたモーションの長さが柔軟であることが望まれ、さまざまな目的のために予測されたモーションをパーソナライズするオプションが提供されます。この目的のために、タスクの種類、オブジェクト、および開始時の人間のステータスのみが与えられた場合に、特定のタスクを実行するための完全なヒューマンオブジェクトインタラクションモーションを生成する TOHO: Task-Oriented Human-Object Interactions Generation with Implicit Neural Representations を提案します。 TOHO は、次の 3 つのステップで人間オブジェクトのモーションを生成します。1) まず、与えられたタスクタイプとオブジェクト情報から、タスクを実行するキーフレームポーズを推定します。 2) 次に、キーフレームを埋めて連続モーションを生成します。 3) 最後に、オブジェクトの動きを生成するために、コンパクトな閉形式のオブジェクトの動きの推定を適用します。私たちの方法は、時間座標によってのみパラメータ化される連続モーションを生成します。これにより、シーケンスを任意のフレームにアップサンプリングまたはダウンサンプリングし、時間座標ベクトルを設計することでモーション速度を調整できます。定性的にも定量的にも、私たちの方法の有効性を実証します。この作業は、一般的なヒューマンシーンインタラクションシミュレーションに向けてさらに一歩進んでいます。

Digital human motion synthesis is a vibrant research field with applications in movies, AR/VR, and video games. Whereas methods were proposed to generate natural and realistic human motions, most only focus on modeling humans and largely ignore object movements. Generating task-oriented human-object interaction motions in simulation is challenging. For different intents of using the objects, humans conduct various motions, which requires the human first to approach the objects and then make them move consistently with the human instead of staying still. Also, to deploy in downstream applications, the synthesized motions are desired to be flexible in length, providing options to personalize the predicted motions for various purposes. To this end, we propose TOHO: Task-Oriented Human-Object Interactions Generation with Implicit Neural Representations, which generates full human-object interaction motions to conduct specific tasks, given only the task type, the object, and a starting human status. TOHO generates human-object motions in three steps: 1) it first estimates the keyframe poses of conducting a task given the task type and object information; 2) then, it infills the keyframes and generates continuous motions; 3) finally, it applies a compact closed-form object motion estimation to generate the object motion. Our method generates continuous motions that are parameterized only by the temporal coordinate, which allows for upsampling or downsampling of the sequence to arbitrary frames and adjusting the motion speeds by designing the temporal coordinate vector. We demonstrate the effectiveness of our method, both qualitatively and quantitatively. This work takes a step further toward general human-scene interaction simulation.

updated: Sat Nov 04 2023 03:47:12 GMT+0000 (UTC)

published: Thu Mar 23 2023 09:31:56 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト