See, Hear, and Feel: Smart Sensory Fusion for Robotic Manipulation

Hao Li; Yizhi Zhang; Junzhe Zhu; Shaoxiong Wang; Michelle A Lee; Huazhe Xu; Edward Adelson; Li Fei-Fei; Ruohan Gao; Jiajun Wu

見て、聞いて、感じる：ロボット操作のためのスマート感覚融合

人間は、日々の活動においてさまざまなタスクを遂行するために、すべての感覚を使用します。対照的に、ロボット操作に関する既存の研究は、主に視覚や触覚などの 1 つまたは場合によっては 2 つのモダリティに依存しています。この作業では、視覚、聴覚、および触覚がロボットが複雑な操作タスクを解決するのにどのように役立つかを体系的に研究します。カメラで見る、コンタクトマイクで聞く、視覚ベースの触覚センサーで感じるロボットシステムを構築し、3 つの感覚モダリティすべてを自己注意モデルと融合させます。高密度充填と注入という 2 つの困難なタスクの結果は、ロボット操作のための多感覚知覚の必要性と力を示しています。視覚はロボットの全体的な状態を表示しますが、しばしば閉塞に悩まされる可能性があり、音声は目に見えない重要な瞬間の即時フィードバックを提供します。およびタッチは、意思決定のための正確なローカルジオメトリを提供します。 3 つのモダリティすべてを活用することで、当社のロボットシステムは以前の方法よりも大幅に優れています。

Humans use all of their senses to accomplish different tasks in everyday activities. In contrast, existing work on robotic manipulation mostly relies on one, or occasionally two modalities, such as vision and touch. In this work, we systematically study how visual, auditory, and tactile perception can jointly help robots to solve complex manipulation tasks. We build a robot system that can see with a camera, hear with a contact microphone, and feel with a vision-based tactile sensor, with all three sensory modalities fused with a self-attention model. Results on two challenging tasks, dense packing and pouring, demonstrate the necessity and power of multisensory perception for robotic manipulation: vision displays the global status of the robot but can often suffer from occlusion, audio provides immediate feedback of key moments that are even invisible, and touch offers precise local geometry for decision making. Leveraging all three modalities, our robotic system significantly outperforms prior methods.

updated: Thu Dec 08 2022 05:52:16 GMT+0000 (UTC)

published: Wed Dec 07 2022 18:55:53 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト