IST-Net: Prior-free Category-level Pose Estimation with Implicit Space Transformation

Jianhui Liu; Yukang Chen; Xiaoqing Ye; Xiaojuan Qi

IST-Net: 暗黙的な空間変換を使用した事前定義不要のカテゴリレベルの姿勢推定

カテゴリレベルの 6D 姿勢推定は、特定のカテゴリの目に見えないオブジェクトの姿勢とサイズを予測することを目的としています。カテゴリ固有の 3D 事前 (つまり 3D テンプレート) を特定のオブジェクトインスタンスに明示的に適応させる事前変形のおかげで、事前ベースの方法は大きな成功を収め、主要な研究の流れになりました。ただし、カテゴリ固有の事前分布を取得するには、大量の 3D モデルを収集する必要があり、これには労力がかかり、実際にはアクセスできないことがよくあります。これは、事前分布に基づく手法を効果的にするために事前分布が必要かどうかを調査する動機になります。私たちの実証研究は、3D 事前処理自体が高いパフォーマンスの功績ではないことを示しています。実際のキーポイントは、ワールド空間 3D モデル (正準空間とも呼ばれます) によって管理されるカメラとワールド座標を位置合わせする明示的な変形プロセスです。これらの観察に触発されて、我々は、カメラ空間の特徴をワールド空間の対応物に変換し、3D 事前分布に依存せずに暗黙的な方法でそれらの間の対応関係を構築する、単純な事前分布なしの暗黙的空間変換ネットワーク、すなわち IST-Net を導入します。さらに、カメラとワールド空間のエンハンサーを設計して、それぞれポーズに依存する情報と幾何学的制約で機能を強化します。 IST-Net はシンプルではありますが、事前の不要な設計に基づいて最先端のパフォーマンスを実現し、REAL275 ベンチマークで最高の推論速度を実現します。コードとモデルは https://github.com/CVMI-Lab/IST-Net で入手できます。

Category-level 6D pose estimation aims to predict the poses and sizes of unseen objects from a specific category. Thanks to prior deformation, which explicitly adapts a category-specific 3D prior (i.e., a 3D template) to a given object instance, prior-based methods attained great success and have become a major research stream. However, obtaining category-specific priors requires collecting a large amount of 3D models, which is labor-consuming and often not accessible in practice. This motivates us to investigate whether priors are necessary to make prior-based methods effective. Our empirical study shows that the 3D prior itself is not the credit to the high performance. The keypoint actually is the explicit deformation process, which aligns camera and world coordinates supervised by world-space 3D models (also called canonical space). Inspired by these observations, we introduce a simple prior-free implicit space transformation network, namely IST-Net, to transform camera-space features to world-space counterparts and build correspondence between them in an implicit manner without relying on 3D priors. Besides, we design camera- and world-space enhancers to enrich the features with pose-sensitive information and geometrical constraints, respectively. Albeit simple, IST-Net achieves state-of-the-art performance based-on prior-free design, with top inference speed on the REAL275 benchmark. Our code and models are available at https://github.com/CVMI-Lab/IST-Net.

updated: Wed Jul 19 2023 16:11:13 GMT+0000 (UTC)

published: Thu Mar 23 2023 17:48:12 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト