Affordance Grounding from Demonstration Video to Target Image

Joya Chen; Difei Gao; Kevin Qinghong Lin; Mike Zheng Shou

デモンストレーションビデオからターゲットイメージへのアフォーダンスグラウンディング

人間は、専門家のデモンストレーションから学び、自分の問題を解決することに長けています。 ARメガネなどのインテリジェントロボットやアシスタントにこの機能を装備するには、デモビデオから人間の手の相互作用（つまり、アフォーダンス）を接地し、ユーザーのARグラスビューなどのターゲット画像に適用することが不可欠です。ビデオから画像へのアフォーダンスグラウンディングタスクは、(1) きめの細かいアフォーダンスを予測する必要があること、および (2) ビデオ画像の不一致を適切にカバーせず、グラウンディングに悪影響を与えるトレーニングデータが限られているため、困難です。それらに取り組むために、アフォーダンストランスフォーマー (Afformer) を提案します。これは、アフォーダンスグラウンディングを徐々に改良する、きめの細かいトランスフォーマーベースのデコーダーを備えています。さらに、Mask Affordance Hand (MaskAHand) を導入します。これは、ビデオ画像データを合成し、コンテキストの変化をシミュレートするための自己教師ありの事前トレーニング手法であり、ビデオ画像の不一致全体でアフォーダンスの接地を強化します。 MaskAHand 事前トレーニングを使用した Afformer は、OPRA データセットでの大幅な 37% の改善を含む、複数のベンチマークで最先端のパフォーマンスを達成します。コードは https://github.com/showlab/afformer で入手できます。

Humans excel at learning from expert demonstrations and solving their own problems. To equip intelligent robots and assistants, such as AR glasses, with this ability, it is essential to ground human hand interactions (i.e., affordances) from demonstration videos and apply them to a target image like a user's AR glass view. The video-to-image affordance grounding task is challenging due to (1) the need to predict fine-grained affordances, and (2) the limited training data, which inadequately covers video-image discrepancies and negatively impacts grounding. To tackle them, we propose Affordance Transformer (Afformer), which has a fine-grained transformer-based decoder that gradually refines affordance grounding. Moreover, we introduce Mask Affordance Hand (MaskAHand), a self-supervised pre-training technique for synthesizing video-image data and simulating context changes, enhancing affordance grounding across video-image discrepancies. Afformer with MaskAHand pre-training achieves state-of-the-art performance on multiple benchmarks, including a substantial 37% improvement on the OPRA dataset. Code is made available at https://github.com/showlab/afformer.

updated: Sun Mar 26 2023 07:02:41 GMT+0000 (UTC)

published: Sun Mar 26 2023 07:02:41 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト