KDFNet: Learning Keypoint Distance Field for 6D Object Pose Estimation

Xingyu Liu; Shun Iwase; Kris M. Kitani

KDFNet：6Dオブジェクトポーズ推定のためのキーポイント距離フィールドの学習

RGB画像から6Dオブジェクトポーズを推定するための新しい方法であるKDFNetを紹介します。オクルージョンを処理するために、最近の多くの研究では、ピクセル単位の投票によって2Dキーポイントをローカライズし、ポーズ推定のPerspective-n-Point（PnP）問題を解決することを提案しています。これにより、優れたパフォーマンスが実現します。ただし、このような投票プロセスは方向ベースであり、方向の交点を確実に見つけることができない長くて薄いオブジェクトを処理することはできません。この問題に対処するために、投影された2Dキーポイント位置のキーポイント距離フィールド（KDF）と呼ばれる新しい連続表現を提案します。 KDFの各要素は、2D配列として定式化され、対応する画像ピクセルと指定された投影2Dキーポイントとの間の2Dユークリッド距離を格納します。完全畳み込みニューラルネットワークを使用して、各キーポイントのKDFを回帰します。投影されたオブジェクトのキーポイント位置のこのKDFエンコーディングを使用して、距離ベースの投票スキームを使用して、RANSAC方式で円の交点を計算することによってキーポイントをローカライズすることを提案します。広範なアブレーション実験により、フレームワークの設計上の選択を検証します。私たちが提案する方法は、平均ADD（-S）精度が50.3％のオクルージョンLINEMODデータセットと、平均ADD精度が75.72％のTODデータセットマグサブセットで最先端のパフォーマンスを実現します。広範な実験と視覚化は、提案された方法が閉塞を含む困難なシナリオで6Dポーズをロバストに推定できることを示しています。

We present KDFNet, a novel method for 6D object pose estimation from RGB images. To handle occlusion, many recent works have proposed to localize 2D keypoints through pixel-wise voting and solve a Perspective-n-Point (PnP) problem for pose estimation, which achieves leading performance. However, such voting process is direction-based and cannot handle long and thin objects where the direction intersections cannot be robustly found. To address this problem, we propose a novel continuous representation called Keypoint Distance Field (KDF) for projected 2D keypoint locations. Formulated as a 2D array, each element of the KDF stores the 2D Euclidean distance between the corresponding image pixel and a specified projected 2D keypoint. We use a fully convolutional neural network to regress the KDF for each keypoint. Using this KDF encoding of projected object keypoint locations, we propose to use a distance-based voting scheme to localize the keypoints by calculating circle intersections in a RANSAC fashion. We validate the design choices of our framework by extensive ablation experiments. Our proposed method achieves state-of-the-art performance on Occlusion LINEMOD dataset with an average ADD(-S) accuracy of 50.3% and TOD dataset mug subset with an average ADD accuracy of 75.72%. Extensive experiments and visualizations demonstrate that the proposed method is able to robustly estimate the 6D pose in challenging scenarios including occlusion.

updated: Tue Sep 21 2021 12:17:24 GMT+0000 (UTC)

published: Tue Sep 21 2021 12:17:24 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト