Deep Instance Segmentation and Visual Servoing to Play Jenga with a Cost-Effective Robotic System

Luca Marchionna; Giulio Pugliese; Mauro Martini; Simone Angarano; Francesco Salvetti; Marcello Chiaberge

費用対効果の高いロボットシステムでジェンガをプレイするためのディープインスタンスセグメンテーションとビジュアルサーボ

ジェンガのゲームは、複雑なタスクのための革新的な操作ソリューションを開発するための刺激的なベンチマークを表しています。実際、タワーからブロックをうまく抽出するための新しいロボット技術の研究が奨励されました。ジェンガゲームラウンドには、間違いなく、複雑な工業的または外科的操作タスクの多くの特徴が埋め込まれており、複数のステップの戦略、視覚データと触覚データの組み合わせ、ロボットアームの非常に正確な動きが単一のブロックの抽出を実行する必要があります。この作業では、e.Do、Comau 製の 6-DOF 擬人化マニピュレーター、標準深度カメラ、および安価な一方向力センサーを使用してジェンガをプレイするための新しい費用対効果の高いアーキテクチャを提案します。当社のソリューションは、エンドエフェクタを目的のブロックに正確に位置合わせするための視覚ベースの制御戦略に焦点を当てており、押してブロックを抽出できます。この目的のために、合成カスタムデータセットでインスタンスセグメンテーションディープラーニングモデルをトレーニングして、ジェンガタワーの各部分をセグメント化し、マニピュレーターの動作中に目的のブロックのポーズを視覚的に追跡できるようにします。視覚ベースの戦略を 1D 力センサーと統合して、力のしきい値を特定することでブロックを安全に取り外すことができるかどうかを検出します。私たちの実験では、低コストのソリューションにより、e.DO が取り外し可能なブロックに正確に到達し、最大 14 回連続して抽出を実行できることが示されています。

The game of Jenga represents an inspiring benchmark for developing innovative manipulation solutions for complex tasks. Indeed, it encouraged the study of novel robotics methods to extract blocks from the tower successfully. A Jenga game round undoubtedly embeds many traits of complex industrial or surgical manipulation tasks, requiring a multi-step strategy, the combination of visual and tactile data, and the highly precise motion of the robotic arm to perform a single block extraction. In this work, we propose a novel cost-effective architecture for playing Jenga with e.Do, a 6-DOF anthropomorphic manipulator manufactured by Comau, a standard depth camera, and an inexpensive monodirectional force sensor. Our solution focuses on a visual-based control strategy to accurately align the end-effector with the desired block, enabling block extraction by pushing. To this aim, we train an instance segmentation deep learning model on a synthetic custom dataset to segment each piece of the Jenga tower, allowing visual tracking of the desired block's pose during the motion of the manipulator. We integrate the visual-based strategy with a 1D force sensor to detect whether the block can be safely removed by identifying a force threshold value. Our experimentation shows that our low-cost solution allows e.DO to precisely reach removable blocks and perform up to 14 consecutive extractions in a row.

updated: Tue Nov 15 2022 08:26:50 GMT+0000 (UTC)

published: Tue Nov 15 2022 08:26:50 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト