Learning to Imitate Object Interactions from Internet Videos

Austin Patel; Andrew Wang; Ilija Radosavovic; Jitendra Malik

インターネットビデオからオブジェクトの相互作用を模倣することを学ぶ

私たちは、インターネットビデオからオブジェクトの相互作用を模倣する問題を研究しています。これには、手とオブジェクトの相互作用を 4D で、3D で空間的に、時間の経過とともに理解する必要があります。この論文では、2つの主な貢献を行います。（1）2D画像の手がかりと一時的な滑らかさの制約を使用して、手とオブジェクトの両方の4D軌跡を再構築する新しい再構築技術RHOV（ビデオからの手とオブジェクトの再構築）。 (2) 強化学習を使用して物理シミュレーターでオブジェクトの相互作用を模倣するためのシステム。私たちは再構築技術を 100 の挑戦的なインターネットビデオに適用します。さらに、物理シミュレーターでさまざまなオブジェクトの相互作用をうまく模倣できることを示します。私たちのオブジェクト中心のアプローチは、人間のようなエンドエフェクターに限定されず、平行ジョーグリッパーを備えたロボットアームなど、さまざまな実施形態を使用してオブジェクトの相互作用を模倣することを学ぶことができます。

We study the problem of imitating object interactions from Internet videos. This requires understanding the hand-object interactions in 4D, spatially in 3D and over time, which is challenging due to mutual hand-object occlusions. In this paper we make two main contributions: (1) a novel reconstruction technique RHOV (Reconstructing Hands and Objects from Videos), which reconstructs 4D trajectories of both the hand and the object using 2D image cues and temporal smoothness constraints; (2) a system for imitating object interactions in a physics simulator with reinforcement learning. We apply our reconstruction technique to 100 challenging Internet videos. We further show that we can successfully imitate a range of different object interactions in a physics simulator. Our object-centric approach is not limited to human-like end-effectors and can learn to imitate object interactions using different embodiments, like a robotic arm with a parallel jaw gripper.

updated: Wed Nov 23 2022 18:59:07 GMT+0000 (UTC)

published: Wed Nov 23 2022 18:59:07 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト