Crafting Monocular Cues and Velocity Guidance for Self-Supervised Multi-Frame Depth Learning

Xiaofeng Wang; Zheng Zhu; Guan Huang; Xu Chi; Yun Ye; Ziwei Chen; Xingang Wang

自己管理型マルチフレーム深度学習のための単眼キューと速度ガイダンスの作成

自己教師あり単眼法は、テクスチャの弱い表面や反射物体の深度情報を効率的に学習できます。ただし、単眼の幾何学的モデリングには固有のあいまいさがあるため、深度の精度は制限されます。対照的に、マルチフレーム深度推定方法は、幾何学的制約を直接利用するマルチビューステレオ (MVS) の成功のおかげで深度精度を向上させます。残念なことに、MVS は、テクスチャのない領域、非ランバートサーフェス、および移動オブジェクトに悩まされることが多く、特にカメラの動きや深度の監視が既知でない現実世界のビデオシーケンスでは顕著です。したがって、MOVEDepth を提案します。これは、MOnocular キューと VElocity ガイダンスを利用して、マルチフレーム深度学習を改善します。 MVS 深度と単眼深度の一貫性を強制する既存の方法とは異なり、MOVEDepth は、MVS 固有の問題に直接対処することで、マルチフレーム深度学習を強化します。私たちのアプローチの鍵は、MVS コストボリュームを構築する幾何学的優先順位として単眼深度を利用し、予測されたカメラ速度のガイダンスの下でコストボリュームの深度候補を調整することです。コストボリュームの不確実性を学習することにより、単眼深度とMVS深度をさらに融合させます。これにより、マルチビュージオメトリのあいまいさに対する堅牢な深度推定が得られます。 MOVEDepth が最先端のパフォーマンスを達成することは広範な実験で示されています。Monodepth2 および PackNet と比較すると、KITTI ベンチマークで深度精度が 20% および 19.8% 向上しています。 MOVEDepth は、より困難な DDAD ベンチマークにも一般化されており、ManyDepth を 7.2% 上回っています。コードは https://github.com/JeffWang987/MOVEDepth で入手できます。

Self-supervised monocular methods can efficiently learn depth information of weakly textured surfaces or reflective objects. However, the depth accuracy is limited due to the inherent ambiguity in monocular geometric modeling. In contrast, multi-frame depth estimation methods improve the depth accuracy thanks to the success of Multi-View Stereo (MVS), which directly makes use of geometric constraints. Unfortunately, MVS often suffers from texture-less regions, non-Lambertian surfaces, and moving objects, especially in real-world video sequences without known camera motion and depth supervision. Therefore, we propose MOVEDepth, which exploits the MOnocular cues and VElocity guidance to improve multi-frame Depth learning. Unlike existing methods that enforce consistency between MVS depth and monocular depth, MOVEDepth boosts multi-frame depth learning by directly addressing the inherent problems of MVS. The key of our approach is to utilize monocular depth as a geometric priority to construct MVS cost volume, and adjust depth candidates of cost volume under the guidance of predicted camera velocity. We further fuse monocular depth and MVS depth by learning uncertainty in the cost volume, which results in a robust depth estimation against ambiguity in multi-view geometry. Extensive experiments show MOVEDepth achieves state-of-the-art performance: Compared with Monodepth2 and PackNet, our method relatively improves the depth accuracy by 20% and 19.8% on the KITTI benchmark. MOVEDepth also generalizes to the more challenging DDAD benchmark, relatively outperforming ManyDepth by 7.2%. The code is available at https://github.com/JeffWang987/MOVEDepth.

updated: Fri Aug 19 2022 06:32:06 GMT+0000 (UTC)

published: Fri Aug 19 2022 06:32:06 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト