GenPose: Generative Category-level Object Pose Estimation via Diffusion Models

Jiyao Zhang; Mingdong Wu; Hao Dong

GenPose: 拡散モデルによる生成的なカテゴリーレベルのオブジェクト姿勢推定

オブジェクトの姿勢推定は、身体化された AI とコンピュータービジョンにおいて重要な役割を果たし、インテリジェントエージェントが周囲の環境を理解し、対話できるようにします。カテゴリレベルの姿勢推定の実用性にもかかわらず、現在のアプローチは、部分的に観測された点群に関する、多仮説問題として知られる課題に直面しています。この研究では、従来のポイントツーポイント回帰から離れて、カテゴリレベルの物体姿勢推定を条件付き生成モデリングとして再構成することにより、新しい解決策を提案します。スコアベースの拡散モデルを活用して、拡散モデルから候補をサンプリングし、尤度推定によって外れ値をフィルタリングし、その後残りの候補を平均プーリングするという 2 段階のプロセスを通じてそれらを集約することにより、物体の姿勢を推定します。尤度を推定する際のコストのかかる統合プロセスを回避するために、元のスコアベースのモデルからエネルギーベースのモデルをトレーニングし、エンドツーエンドの尤度推定を可能にする代替方法を導入します。私たちのアプローチは、REAL275 データセットで最先端のパフォーマンスを達成し、厳密な 5d2cm および 5d5cm メトリクスでそれぞれ 50% と 60% を超えています。さらに、私たちの方法は、微調整することなく、同様の対称特性を共有する新しいカテゴリに対する強力な一般化可能性を実証し、オブジェクト姿勢追跡タスクに容易に適応でき、現在の最先端のベースラインと同等の結果をもたらします。

Object pose estimation plays a vital role in embodied AI and computer vision, enabling intelligent agents to comprehend and interact with their surroundings. Despite the practicality of category-level pose estimation, current approaches encounter challenges with partially observed point clouds, known as the multihypothesis issue. In this study, we propose a novel solution by reframing categorylevel object pose estimation as conditional generative modeling, departing from traditional point-to-point regression. Leveraging score-based diffusion models, we estimate object poses by sampling candidates from the diffusion model and aggregating them through a two-step process: filtering out outliers via likelihood estimation and subsequently mean-pooling the remaining candidates. To avoid the costly integration process when estimating the likelihood, we introduce an alternative method that trains an energy-based model from the original score-based model, enabling end-to-end likelihood estimation. Our approach achieves state-of-the-art performance on the REAL275 dataset, surpassing 50% and 60% on strict 5d2cm and 5d5cm metrics, respectively. Furthermore, our method demonstrates strong generalizability to novel categories sharing similar symmetric properties without fine-tuning and can readily adapt to object pose tracking tasks, yielding comparable results to the current state-of-the-art baselines.

updated: Mon Dec 25 2023 08:03:49 GMT+0000 (UTC)

published: Sun Jun 18 2023 11:45:42 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト