A Dynamic Keypoints Selection Network for 6DoF Pose Estimation

Haowen Sun; Taiyong Wang

6DoFポーズ推定のための動的キーポイント選択ネットワーク

6 DoFポーズ推定問題は、オブジェクトワールド座標とカメラワールド座標などの2つの座標間の回転および平行移動パラメータを推定することを目的としています。ディープラーニングの助けを借りていくつかの進歩がありますが、シーン情報をどのように最大限に活用するかは依然として問題です。以前の研究では、ピクセル単位の特徴融合によって問題に取り組んでいますが、画像から多数の点をランダムに選択する必要があり、高速推論と正確なポーズ推定の要求を満たすことができません。この作業では、単一のRGBD画像からの6DoFポーズ推定用に設計された動的キーポイント選択に基づく新しいディープニューラルネットワークを紹介します。私たちのネットワークには、インスタンスのセマンティックセグメンテーション、エッジポイントの検出、6DoFポーズの推定の3つの部分が含まれています。 RGBD画像が与えられると、私たちのネットワークは、ピクセルカテゴリとエッジポイントおよびセンターポイントへの変換を予測するようにトレーニングされます。次に、最小二乗フィッティング法を適用して、6DoFポーズパラメータを推定します。具体的には、前景特徴マップからキーポイントを選択するための動的キーポイント選択アルゴリズムを提案します。これにより、幾何学的情報と外観情報を活用できます。 6DoFポーズ推定では、インスタンスのセマンティックセグメンテーション結果を利用して背景ポイントを除外し、前景ポイントのみを使用してエッジポイントの検出と6DoFポーズ推定を終了します。 2つの一般的に使用される6DoF推定ベンチマークデータセット、YCB-VideoとLineMoDでの実験は、私たちの方法が最先端の方法よりも優れており、他の同じカテゴリの方法の時間効率よりも大幅に改善されていることを示しています。

6 DoF poses estimation problem aims to estimate the rotation and translation parameters between two coordinates, such as object world coordinate and camera world coordinate. Although some advances are made with the help of deep learning, how to full use scene information is still a problem. Prior works tackle the problem by pixel-wise feature fusion but need to randomly selecte numerous points from images, which can not satisfy the demands of fast inference simultaneously and accurate pose estimation. In this work, we present a novel deep neural network based on dynamic keypoints selection designed for 6DoF pose estimation from a single RGBD image. Our network includes three parts, instance semantic segmentation, edge points detection and 6DoF pose estimation. Given an RGBD image, our network is trained to predict pixel category and the translation to edge points and center points. Then, a least-square fitting manner is applied to estimate the 6DoF pose parameters. Specifically, we propose a dynamic keypoints selection algorithm to choose keypoints from the foreground feature map. It allows us to leverage geometric and appearance information. During 6DoF pose estimation, we utilize the instance semantic segmentation result to filter out background points and only use foreground points to finish edge points detection and 6DoF pose estimation. Experiments on two commonly used 6DoF estimation benchmark datasets, YCB-Video and LineMoD, demonstrate that our method outperforms the state-of-the-art methods and achieves significant improvements over other same category methods time efficiency.

updated: Sun Oct 24 2021 09:58:56 GMT+0000 (UTC)

published: Sun Oct 24 2021 09:58:56 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト