One-Shot Object Affordance Detection in the Wild

Wei Zhai; Hongchen Luo; Jing Zhang; Yang Cao; Dacheng Tao

野生でのワンショットオブジェクトアフォーダンス検出

アフォーダンス検出とは、画像内のオブジェクトの潜在的な活動電位を特定することを指します。これは、ロボットの知覚と操作にとって重要な機能です。目に見えないシナリオでこの能力をロボットに与えるために、この論文では最初に挑戦的なワンショットアフォーダンス検出問題を研究します。つまり、アクションの目的を表すサポート画像が与えられた場合、共通のアフォーダンスを持つシーン内のすべてのオブジェクトを検出する必要があります。この目的のために、最初に人間の行動目的を推定し、次にそれを転送してすべての候補画像から共通のアフォーダンスを検出するのに役立つワンショットアフォーダンス検出ネットワーク（OSAD-Net）を考案します。コラボレーション学習を通じて、OSAD-Netは、同じ基本的なアフォーダンスを持つオブジェクト間の共通の特性をキャプチャし、目に見えないアフォーダンスを認識するための優れた適応機能を学習できます。さらに、39のアフォーダンスと103のオブジェクトカテゴリから30kの画像を収集してラベル付けすることにより、大規模な目的駆動型アフォーダンスデータセットv2（PADv2）を構築します。複雑なシーンと豊富な注釈を備えたPADv2データセットは、アフォーダンス検出方法のベンチマークを行うためのテストベッドとして使用でき、シーンの理解、アクション認識、ロボット操作などのダウンストリームビジョンタスクを容易にすることもできます。具体的には、いくつかの関連する研究分野からの11の高度なモデルを含めることにより、PADv2データセットで包括的な実験を実施しました。実験結果は、客観的な指標と視覚的品質の両方の点で、以前の代表的なモデルよりもモデルが優れていることを示しています。ベンチマークスイートは、https：//github.com/lhc1224/OSADNetで入手できます。

Affordance detection refers to identifying the potential action possibilities of objects in an image, which is a crucial ability for robot perception and manipulation. To empower robots with this ability in unseen scenarios, we first study the challenging one-shot affordance detection problem in this paper, i.e., given a support image that depicts the action purpose, all objects in a scene with the common affordance should be detected. To this end, we devise a One-Shot Affordance Detection Network (OSAD-Net) that firstly estimates the human action purpose and then transfers it to help detect the common affordance from all candidate images. Through collaboration learning, OSAD-Net can capture the common characteristics between objects having the same underlying affordance and learn a good adaptation capability for perceiving unseen affordances. Besides, we build a large-scale Purpose-driven Affordance Dataset v2 (PADv2) by collecting and labeling 30k images from 39 affordance and 103 object categories. With complex scenes and rich annotations, our PADv2 dataset can be used as a test bed to benchmark affordance detection methods and may also facilitate downstream vision tasks, such as scene understanding, action recognition, and robot manipulation. Specifically, we conducted comprehensive experiments on PADv2 dataset by including 11 advanced models from several related research fields. Experimental results demonstrate the superiority of our model over previous representative ones in terms of both objective metrics and visual quality. The benchmark suite is available at https://github.com/lhc1224/OSAD Net.

updated: Sun Aug 08 2021 14:53:10 GMT+0000 (UTC)

published: Sun Aug 08 2021 14:53:10 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト