Deep Camera Pose Regression Using Pseudo-LiDAR

Ali Raza; Lazar Lolic; Shahmir Akhter; Alfonso Dela Cruz; Michael Liut

Pseudo-LiDARを使用したディープカメラポーズ回帰

正確で堅牢な大規模なローカリゼーションシステムは、自動運転車や拡張現実などの活発な研究分野に不可欠なコンポーネントです。この目的のために、RGBまたはRGB-D画像から6DOFカメラのポーズを予測する多くの学習アルゴリズムが提案されています。ただし、深度を組み込んだ以前の方法では、通常、データをRGB画像と同じように扱い、深度マップをRGB画像への追加チャネルとして追加し、畳み込みニューラルネットワーク（CNN）に渡します。この論文では、深度マップを疑似LiDAR信号に変換することで、6DOFカメラのポーズを正確に決定できる点群を投影することで、カメラの位置特定タスクをより適切に表現できることを示します。これは、最初に、疑似LiDAR表現のみで動作するネットワークと深度マップのみで動作するネットワークのローカリゼーション精度を比較することによって示されます。次に、疑似LiDARを使用して6DOFカメラポーズを回帰する新しいアーキテクチャであるFusionLocを提案します。 FusionLocはデュアルストリームニューラルネットワークであり、RGB-D画像で動作する一般的な2DCNNの一般的な問題を解決することを目的としています。このアーキテクチャの結果は、7シーンデータセットを使用して、他のさまざまな最先端のディープポーズ回帰の実装と比較されます。調査結果によると、FusionLocは他の多くのカメラのローカリゼーション手法よりも優れており、RGB-DPoseNetよりも平均で0.33mおよび4.35°正確であるという顕著な改善が見られます。ローカリゼーションのために深度マップ上で疑似LiDAR信号を使用することの妥当性を証明することにより、大規模なローカリゼーションシステムを実装する際の新しい考慮事項があります。

An accurate and robust large-scale localization system is an integral component for active areas of research such as autonomous vehicles and augmented reality. To this end, many learning algorithms have been proposed that predict 6DOF camera pose from RGB or RGB-D images. However, previous methods that incorporate depth typically treat the data the same way as RGB images, often adding depth maps as additional channels to RGB images and passing them through convolutional neural networks (CNNs). In this paper, we show that converting depth maps into pseudo-LiDAR signals, previously shown to be useful for 3D object detection, is a better representation for camera localization tasks by projecting point clouds that can accurately determine 6DOF camera pose. This is demonstrated by first comparing localization accuracies of a network operating exclusively on pseudo-LiDAR representations, with networks operating exclusively on depth maps. We then propose FusionLoc, a novel architecture that uses pseudo-LiDAR to regress a 6DOF camera pose. FusionLoc is a dual stream neural network, which aims to remedy common issues with typical 2D CNNs operating on RGB-D images. The results from this architecture are compared against various other state-of-the-art deep pose regression implementations using the 7 Scenes dataset. The findings are that FusionLoc performs better than a number of other camera localization methods, with a notable improvement being, on average, 0.33m and 4.35° more accurate than RGB-D PoseNet. By proving the validity of using pseudo-LiDAR signals over depth maps for localization, there are new considerations when implementing large-scale localization systems.

updated: Mon Feb 28 2022 20:30:37 GMT+0000 (UTC)

published: Mon Feb 28 2022 20:30:37 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト