Affordance segmentation of hand-occluded containers from exocentric images

Tommaso Apicella; Alessio Xompero; Edoardo Ragusa; Riccardo Berta; Andrea Cavallaro; Paolo Gastaldo

エキソセントリック画像からの手で遮られたコンテナのアフォーダンスセグメンテーション

視覚的アフォーダンスセグメンテーションは、エージェントが対話できるオブジェクトの表面を識別します。アフォーダンスを特定する際の一般的な課題は、これらの表面の幾何学的形状と物理的特性、およびオクルージョンの多様性です。この論文では、それを操作する人が手に持つオブジェクトのオクルージョンに焦点を当てます。この課題に対処するために、補助ブランチを使用してオブジェクトと手の領域を個別に処理するアフォーダンスセグメンテーションモデルを提案します。提案されたモデルは、手とオブジェクトのセグメンテーションを通じて特徴マップに重み付けを行うことにより、ハンドオクルージョンの下でアフォーダンス特徴を学習します。モデルをトレーニングするために、既存のデータセットの視覚アフォーダンスに、三人称 (エキソセントリック) 画像内の手持ちコンテナーの複合現実画像を使用してアノテーションを付けました。現実画像と複合現実画像の両方での実験により、私たちのモデルが既存のモデルよりも優れたアフォーダンスのセグメント化と一般化を達成していることがわかりました。

Visual affordance segmentation identifies the surfaces of an object an agent can interact with. Common challenges for the identification of affordances are the variety of the geometry and physical properties of these surfaces as well as occlusions. In this paper, we focus on occlusions of an object that is hand-held by a person manipulating it. To address this challenge, we propose an affordance segmentation model that uses auxiliary branches to process the object and hand regions separately. The proposed model learns affordance features under hand-occlusion by weighting the feature map through hand and object segmentation. To train the model, we annotated the visual affordances of an existing dataset with mixed-reality images of hand-held containers in third-person (exocentric) images. Experiments on both real and mixed-reality images show that our model achieves better affordance segmentation and generalisation than existing models.

updated: Tue Aug 22 2023 07:14:29 GMT+0000 (UTC)

published: Tue Aug 22 2023 07:14:29 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト