Convolutional Cross-View Pose Estimation

Zimin Xia; Olaf Booij; Julian F. P. Kooij

畳み込みクロスビュー姿勢推定

クロスビューポーズ推定のための新しいエンドツーエンドの方法を提案します。地上レベルのクエリ画像と、クエリのローカル近傍をカバーする航空画像が与えられた場合、クエリの 3 自由度カメラポーズは、その画像記述子を航空画像内のローカル領域の記述子と照合することによって推定されます。方向認識記述子は、並進同変畳み込み地上画像エンコーダーと対照学習を使用して取得されます。 Localization Decoder は、新しい Localization Matching Upsampling モジュールを使用して、粗から細かい方法で密な確率分布を生成します。より小さいオリエンテーションデコーダーは、ベクトルフィールドを生成して、ローカリゼーションでのオリエンテーション推定を調整します。私たちの方法は、VIGOR および KITTI データセットで検証されており、同等の方向推定精度の中央位置推定誤差で最先端のベースラインを 72% および 36% 上回っています。予測された確率分布は、ローカリゼーションのあいまいさを表すことができ、誤った予測の可能性を排除できます。再トレーニングを行わなくても、モデルはさまざまな視野を持つ地上画像を推測し、可能であればオリエンテーションプライアを利用できます。オックスフォードの RobotCar データセットでは、私たちの方法は自車両のポーズを時間の経過とともに確実に推定でき、14 FPS で 1 メートル未満の中央位置誤差と約 1 度の中央方向誤差を達成します。

We propose a novel end-to-end method for cross-view pose estimation. Given a ground-level query image and an aerial image that covers the query's local neighborhood, the 3 Degrees-of-Freedom camera pose of the query is estimated by matching its image descriptor to descriptors of local regions within the aerial image. The orientation-aware descriptors are obtained by using a translational equivariant convolutional ground image encoder and contrastive learning. The Localization Decoder produces a dense probability distribution in a coarse-to-fine manner with a novel Localization Matching Upsampling module. A smaller Orientation Decoder produces a vector field to condition the orientation estimate on the localization. Our method is validated on the VIGOR and KITTI datasets, where it surpasses the state-of-the-art baseline by 72% and 36% in median localization error for comparable orientation estimation accuracy. The predicted probability distribution can represent localization ambiguity, and enables rejecting possible erroneous predictions. Without re-training, the model can infer on ground images with different field of views and utilize orientation priors if available. On the Oxford RobotCar dataset, our method can reliably estimate the ego-vehicle's pose over time, achieving a median localization error under 1 meter and a median orientation error of around 1 degree at 14 FPS.

updated: Thu Jun 15 2023 09:12:09 GMT+0000 (UTC)

published: Thu Mar 09 2023 13:52:28 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト