Uncertainty-Driven Dense Two-View Structure from Motion

Weirong Chen; Suryansh Kumar; Fisher Yu

モーションからの不確実性駆動の密な 2 ビュー構造

この作業は、動きからの密な 2 ビュー構造 (SfM) 問題に対する効果的かつ実用的なソリューションを紹介します。取り組む重要な問題の 1 つは、2 つのフレーム間のピクセルごとのオプティカルフロー対応を注意深く使用して正確なポーズを推定する方法です。慎重に推定されたカメラの姿勢と、予測されたピクセルごとのオプティカルフローの対応関係を使用して、シーンの密な深度が計算されます。その後、オプティカルフローマッチングの信頼性、カメラポーズ、および深度をさらに改善するための反復改良手順が導入され、厳密な SfM に固有の依存関係が利用されます。提示された基本的なアイデアは、オプティカルフローの推定におけるピクセルごとの不確実性から利益を得て、オンラインの改良によって高密度の SfM システムにロバスト性を提供することです。具体的には、（i）一致の信頼スコアとピクセルごとの対応を提供する、不確実性を認識した高密度オプティカルフロー推定アプローチで構成されるパイプラインを紹介します。 (ii) オプティカルフローの不確実性と双方向のオプティカルフローの一貫性に依存してポーズと深度の両方を調整する加重高密度バンドル調整定式化。（iii）エピポーラ制約に関する推定ポーズおよびオプティカルフローとの整合性を考慮する深度推定ネットワーク。広範な実験により、DeMoN、YFCC100M、ScanNet などのベンチマークデータセットでテストした場合、提案されたアプローチが SuperPoint や SuperGlue の精度に取って代わる驚くべき深度精度と最先端のカメラポーズ結果を達成することが示されています。

This work introduces an effective and practical solution to the dense two-view structure from motion (SfM) problem. One vital question addressed is how to mindfully use per-pixel optical flow correspondence between two frames for accurate pose estimation -- as perfect per-pixel correspondence between two images is difficult, if not impossible, to establish. With the carefully estimated camera pose and predicted per-pixel optical flow correspondences, a dense depth of the scene is computed. Later, an iterative refinement procedure is introduced to further improve optical flow matching confidence, camera pose, and depth, exploiting their inherent dependency in rigid SfM. The fundamental idea presented is to benefit from per-pixel uncertainty in the optical flow estimation and provide robustness to the dense SfM system via an online refinement. Concretely, we introduce a pipeline consisting of (i) an uncertainty-aware dense optical flow estimation approach that provides per-pixel correspondence with their confidence score of matching; (ii) a weighted dense bundle adjustment formulation that depends on optical flow uncertainty and bidirectional optical flow consistency to refine both pose and depth; (iii) a depth estimation network that considers its consistency with the estimated poses and optical flow respecting epipolar constraint. Extensive experiments show that the proposed approach achieves remarkable depth accuracy and state-of-the-art camera pose results superseding SuperPoint and SuperGlue accuracy when tested on benchmark datasets such as DeMoN, YFCC100M, and ScanNet.

updated: Wed Feb 01 2023 15:52:24 GMT+0000 (UTC)

published: Wed Feb 01 2023 15:52:24 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト