PIZZA: A Powerful Image-only Zero-Shot Zero-CAD Approach to 6 DoF Tracking

Van Nguyen Nguyen; Yuming Du; Yang Xiao; Michael Ramamonjisoa; Vincent Lepetit

PIZZA: 6 DoF トラッキングへの強力な画像のみのゼロショットゼロ CAD アプローチ

事前知識なしに新しいオブジェクトの相対的な姿勢を推定することは困難な問題ですが、ロボット工学と拡張現実で非常に必要とされる能力です。オブジェクトのトレーニング画像も 3D ジオメトリも利用できない場合に、RGB ビデオシーケンスでオブジェクトの 6D モーションを追跡する方法を提示します。したがって、以前の研究とは対照的に、私たちの方法は、事前の情報や特定のトレーニング段階を必要とせずに、オープンワールドで未知のオブジェクトを即座に考慮することができます。 2 つのフレームに基づくアーキテクチャと、任意の数の過去のフレームを利用できる Transformer Encoder に依存するアーキテクチャの 2 つを検討します。ドメインのランダム化による合成レンダリングのみを使用してアーキテクチャをトレーニングします。困難なデータセットに対する私たちの結果は、はるかに多くの情報 (ターゲットオブジェクトのトレーニング画像、3D モデル、および/または深度データ) を必要とする以前の研究と同等です。ソースコードは https://github.com/nv-nguyen/pizza で入手できます。

Estimating the relative pose of a new object without prior knowledge is a hard problem, while it is an ability very much needed in robotics and Augmented Reality. We present a method for tracking the 6D motion of objects in RGB video sequences when neither the training images nor the 3D geometry of the objects are available. In contrast to previous works, our method can therefore consider unknown objects in open world instantly, without requiring any prior information or a specific training phase. We consider two architectures, one based on two frames, and the other relying on a Transformer Encoder, which can exploit an arbitrary number of past frames. We train our architectures using only synthetic renderings with domain randomization. Our results on challenging datasets are on par with previous works that require much more information (training images of the target objects, 3D models, and/or depth data). Our source code is available at https://github.com/nv-nguyen/pizza

updated: Sat Oct 01 2022 20:38:52 GMT+0000 (UTC)

published: Thu Sep 15 2022 19:55:13 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト