Dense Prediction Transformer for Scale Estimation in Monocular Visual Odometry

André O. Françani; Marcos R. O. A. Maximo

単眼視距離測定における縮尺推定のための高密度予測変換器

単眼視覚オドメトリは、単一のカメラの画像からエージェントの位置を推定することで構成され、自動運転車、医療用ロボット、および拡張現実に適用されます。ただし、単眼システムは、2D フレームの深度情報が不足しているため、スケールのあいまいさの問題に悩まされています。この論文は、単眼ビジュアルオドメトリシステムにおける縮尺推定のための高密度予測変換モデルの適用を示すことによって貢献します。実験結果は、単眼システムのスケールドリフトの問題は、このモデルによる深度マップの正確な推定によって削減できることを示しており、ビジュアルオドメトリベンチマークで競争力のある最先端のパフォーマンスを達成しています。

Monocular visual odometry consists of the estimation of the position of an agent through images of a single camera, and it is applied in autonomous vehicles, medical robots, and augmented reality. However, monocular systems suffer from the scale ambiguity problem due to the lack of depth information in 2D frames. This paper contributes by showing an application of the dense prediction transformer model for scale estimation in monocular visual odometry systems. Experimental results show that the scale drift problem of monocular systems can be reduced through the accurate estimation of the depth map by this model, achieving competitive state-of-the-art performance on a visual odometry benchmark.

updated: Tue Oct 04 2022 16:29:21 GMT+0000 (UTC)

published: Tue Oct 04 2022 16:29:21 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト