DF-VO: What Should Be Learnt for Visual Odometry?

Huangying Zhan; Chamara Saroj Weerasekera; Jia-Wang Bian; Ravi Garg; Ian Reid

DF-VO：視覚オドメトリについて何を学ぶべきですか？

マルチビュージオメトリベースの方法は、その優れたパフォーマンスのために単眼ビジュアルオドメトリの過去数十年を支配していますが、動的で低テクスチャのシーンに対して脆弱でした。さらに重要なことに、単眼法はスケールドリフトの問題に悩まされます。つまり、エラーは時間の経過とともに蓄積されます。最近の研究によると、ディープニューラルネットワークは、グラウンドトゥルースラベルを取得することなく、自己監視方式でシーンの深さと相対カメラを学習できます。さらに驚くべきことに、十分にトレーニングされたネットワークにより、長いビデオでスケールの一貫性のある予測が可能になる一方で、幾何学的情報を無視するため、精度は従来の方法よりも劣ることが示されています。コンピュータビジョンの最近の進歩に基づいて、マルチビュージオメトリと、深度とオプティカルフローに関するディープラーニング、つまりDF-VOを統合することにより、シンプルでありながら堅牢なVOシステムを設計します。この作業では、a）深い流れから高品質の対応を注意深くサンプリングし、幾何学的モジュールを使用して正確なカメラポーズを復元する方法を提案します。 b）動的なシーンが考慮される、幾何学的に三角測量された深度をスケール整合性のある深い深度に位置合わせすることにより、スケールドリフトの問題に対処します。包括的なアブレーション研究は提案された方法の有効性を示し、広範な評価結果は、翻訳エラーの観点から、例えば、Ours（1.652％）対ORB-SLAM（3.247％}）のようなシステムの最先端のパフォーマンスを示しています。 KITTIオドメトリベンチマークで。ソースコードはhttps://github.com/Huangying-Zhan/DF-VODF-VOで公開されています。

Multi-view geometry-based methods dominate the last few decades in monocular Visual Odometry for their superior performance, while they have been vulnerable to dynamic and low-texture scenes. More importantly, monocular methods suffer from scale-drift issue, i.e., errors accumulate over time. Recent studies show that deep neural networks can learn scene depths and relative camera in a self-supervised manner without acquiring ground truth labels. More surprisingly, they show that the well-trained networks enable scale-consistent predictions over long videos, while the accuracy is still inferior to traditional methods because of ignoring geometric information. Building on top of recent progress in computer vision, we design a simple yet robust VO system by integrating multi-view geometry and deep learning on Depth and optical Flow, namely DF-VO. In this work, a) we propose a method to carefully sample high-quality correspondences from deep flows and recover accurate camera poses with a geometric module; b) we address the scale-drift issue by aligning geometrically triangulated depths to the scale-consistent deep depths, where the dynamic scenes are taken into account. Comprehensive ablation studies show the effectiveness of the proposed method, and extensive evaluation results show the state-of-the-art performance of our system, e.g., Ours (1.652%) v.s. ORB-SLAM (3.247%}) in terms of translation error in KITTI Odometry benchmark. Source code is publicly available at: https://github.com/Huangying-Zhan/DF-VODF-VO.

updated: Mon Mar 01 2021 11:50:39 GMT+0000 (UTC)

published: Mon Mar 01 2021 11:50:39 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト