Self-Supervised Interactive Object Segmentation Through a Singulation-and-Grasping Approach

Houjian Yu; Changhyun Choi

シンギュレーションと把握のアプローチによる自己監視型インタラクティブオブジェクトセグメンテーション

見えないオブジェクトによるインスタンスのセグメンテーションは、構造化されていない環境では難しい問題です。この問題を解決するために、データセットに手動でラベルを付けるという時間のかかるプロセスを回避しながら、新しいオブジェクトと積極的に対話し、各オブジェクトのトレーニングラベルを収集してさらに微調整し、セグメンテーションモデルのパフォーマンスを向上させるロボット学習アプローチを提案します。 Singulation-and-Grasping（SaG）ポリシーは、エンドツーエンドの強化学習を通じてトレーニングされます。散らかったオブジェクトの山を考えると、私たちのアプローチは、押したり握ったりする動作を選択して散らかったものを壊し、SaGポリシーが視覚的観察と不完全なセグメンテーションを入力として受け取るオブジェクトに依存しない把握を行います。問題を3つのサブタスクに分解します。（1）オブジェクトシンギュレーションサブタスクは、オブジェクトを互いに分離することを目的としています。これにより、（2）衝突のない把握サブタスクの難しさを軽減するスペースが増えます。（3）転移学習のためのオプティカルフローベースの二項分類器とモーションキュー後処理を使用して、自己ラベル付けされたグラウンドトゥルースマスクを取得するためのマスク生成サブタスク。私たちのシステムは、シミュレートされた雑然としたシーンで70％のシンギュレーション成功率を達成します。私たちのシステムのインタラクティブなセグメンテーションは、おもちゃのブロック、シミュレーションのYCBオブジェクト、実際の新しいオブジェクトの平均精度をそれぞれ87.8％、73.9％、69.3％に達成し、いくつかのベースラインを上回っています。

Instance segmentation with unseen objects is a challenging problem in unstructured environments. To solve this problem, we propose a robot learning approach to actively interact with novel objects and collect each object's training label for further fine-tuning to improve the segmentation model performance, while avoiding the time-consuming process of manually labeling a dataset. The Singulation-and-Grasping (SaG) policy is trained through end-to-end reinforcement learning. Given a cluttered pile of objects, our approach chooses pushing and grasping motions to break the clutter and conducts object-agnostic grasping for which the SaG policy takes as input the visual observations and imperfect segmentation. We decompose the problem into three subtasks: (1) the object singulation subtask aims to separate the objects from each other, which creates more space that alleviates the difficulty of (2) the collision-free grasping subtask; (3) the mask generation subtask to obtain the self-labeled ground truth masks by using an optical flow-based binary classifier and motion cue post-processing for transfer learning. Our system achieves 70% singulation success rate in simulated cluttered scenes. The interactive segmentation of our system achieves 87.8%, 73.9%, and 69.3% average precision for toy blocks, YCB objects in simulation and real-world novel objects, respectively, which outperforms several baselines.

updated: Tue Jul 19 2022 15:01:36 GMT+0000 (UTC)

published: Tue Jul 19 2022 15:01:36 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト