DS-Depth: Dynamic and Static Depth Estimation via a Fusion Cost Volume

Xingyu Miao; Yang Bai; Haoran Duan; Yawen Huang; Fan Wan; Xinxing Xu; Yang Long; Yefeng Zheng

DS-Depth: 融合コストボリュームによる動的および静的な深さの推定

自己監視型単眼奥行き推定法は通常、静的環境における連続するフレーム間の幾何学的関係を捕捉するために再投影誤差に依存します。ただし、この仮定はシナリオ内の動的オブジェクトには当てはまらず、ビューの合成段階でフィーチャの不一致やオクルージョンなどのエラーが発生し、生成される深度マップの精度が大幅に低下する可能性があります。この問題に対処するために、我々は、移動オブジェクトを記述するために残留オプティカルフローを利用する新しい動的コストボリュームを提案し、以前の研究で使用されていた静的コストボリュームで誤って遮蔽された領域を改善します。それにもかかわらず、動的コストボリュームは必然的に余分なオクルージョンとノイズを生成するため、静的コストボリュームと動的コストボリュームを相互に補償する融合モジュールを設計することでこれを軽減します。換言すれば、静的ボリュームからのオクルージョンは動的ボリュームによって洗練され、動的ボリュームからの不正確な情報は静的ボリュームによって除去される。さらに、低解像度での測光誤差の不正確さを低減するためのピラミッド蒸留損失と、オクルージョン領域における大きな勾配の流れの方向を緩和するための適応型測光誤差損失を提案します。私たちは KITTI および Cityscapes データセットに対して広範な実験を実施しました。その結果、私たちのモデルが自己教師付き単眼奥行き推定に関して以前に公開されたベースラインを上回るパフォーマンスを示しました。

Self-supervised monocular depth estimation methods typically rely on the reprojection error to capture geometric relationships between successive frames in static environments. However, this assumption does not hold in dynamic objects in scenarios, leading to errors during the view synthesis stage, such as feature mismatch and occlusion, which can significantly reduce the accuracy of the generated depth maps. To address this problem, we propose a novel dynamic cost volume that exploits residual optical flow to describe moving objects, improving incorrectly occluded regions in static cost volumes used in previous work. Nevertheless, the dynamic cost volume inevitably generates extra occlusions and noise, thus we alleviate this by designing a fusion module that makes static and dynamic cost volumes compensate for each other. In other words, occlusion from the static volume is refined by the dynamic volume, and incorrect information from the dynamic volume is eliminated by the static volume. Furthermore, we propose a pyramid distillation loss to reduce photometric error inaccuracy at low resolutions and an adaptive photometric error loss to alleviate the flow direction of the large gradient in the occlusion regions. We conducted extensive experiments on the KITTI and Cityscapes datasets, and the results demonstrate that our model outperforms previously published baselines for self-supervised monocular depth estimation.

updated: Mon Aug 14 2023 15:57:42 GMT+0000 (UTC)

published: Mon Aug 14 2023 15:57:42 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト