Coarse-to-Fine Active Segmentation of Interactable Parts in Real Scene Images

Ruiqi Wang; Akshay Gadi Patil; Fenggen Yu; Hao Zhang

実際のシーン画像の相互作用可能な部分の粗いから細かいアクティブセグメンテーション

実際の屋内シーンの RGB 画像から、動的でインタラクティブなパーツを高精度にインスタンスセグメンテーションするための最初のアクティブラーニング (AL) フレームワークを紹介します。ほとんどのヒューマンインザループアプローチと同様に、AL の成功の重要な基準は、高いパフォーマンスを達成しながら人間の労力を最小限に抑えることです。この目的のために、マスクされた注意メカニズムを利用するトランスフォーマーベースのセグメンテーションネットワークを採用しています。私たちのタスクに合わせてネットワークを強化するために、最初にオブジェクト認識のマスクされた注意を使用し、次にポーズ認識の注意を使用する粗から細かいモデルを導入し、相互作用可能なパーツとオブジェクトのポーズの間の相関関係を活用し、処理の改善につながります画像内の複数の関節オブジェクト。粗から細かいアクティブセグメンテーションモジュールは、トランスフォーマーを使用して 2D インスタンスと 3D ポーズ情報の両方を学習します。これにより、アクティブセグメンテーションが監視され、人間の労力が効果的に削減されます。私たちの方法は、実際の画像でほぼ完全に正確な (96% 以上) セグメンテーション結果を達成し、手作業よりも 77% 時間を節約できます。トレーニングデータは、注釈付きの実際の写真の 16.6% のみで構成されます。最後に、注釈付きのインタラクティブなパーツを含む 2,550 枚の実際の写真のデータセットを提供し、現在の最良の選択肢よりも優れた品質と多様性を示しています。

We introduce the first active learning (AL) framework for high-accuracy instance segmentation of dynamic, interactable parts from RGB images of real indoor scenes. As with most human-in-the-loop approaches, the key criterion for success in AL is to minimize human effort while still attaining high performance. To this end, we employ a transformer-based segmentation network that utilizes a masked-attention mechanism. To enhance the network, tailoring to our task, we introduce a coarse-to-fine model which first uses object-aware masked attention and then a pose-aware one, leveraging a correlation between interactable parts and object poses and leading to improved handling of multiple articulated objects in an image. Our coarse-to-fine active segmentation module learns both 2D instance and 3D pose information using the transformer, which supervises the active segmentation and effectively reduces human effort. Our method achieves close to fully accurate (96% and higher) segmentation results on real images, with 77% time saving over manual effort, where the training data consists of only 16.6% annotated real photographs. At last, we contribute a dataset of 2,550 real photographs with annotated interactable parts, demonstrating its superior quality and diversity over the current best alternative.

updated: Tue Mar 21 2023 01:30:20 GMT+0000 (UTC)

published: Tue Mar 21 2023 01:30:20 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト