Towards Accurate Reconstruction of 3D Scene Shape from A Single Monocular Image

Wei Yin; Jianming Zhang; Oliver Wang; Simon Nicklaus; Simon Chen; Yifan Liu; Chunhua Shen

単一の単眼画像からの 3D シーン形状の正確な再構成に向けて

過去数年間に大きな進歩があったにもかかわらず、単一の単眼画像を使用した深度推定には課題が残っています。まず、主にトレーニングデータが限られているため、さまざまなシーンにうまく一般化できるメトリック深度予測モデルをトレーニングすることは自明ではありません。したがって、研究者は、収集がはるかに簡単な大規模な相対深度データセットを構築しました。ただし、既存の相対深度推定モデルは、相対深度データを使用したトレーニングによって引き起こされる未知の深度シフトが原因で、正確な 3D シーン形状を復元できないことがよくあります。ここでこの問題に取り組み、大規模な相対深度データでトレーニングし、深度シフトを推定することにより、正確なシーン形状を推定しようとします。そのために、最初に未知のスケールまで深度を予測し、単一の単眼画像からシフトする 2 段階のフレームワークを提案し、次に 3D 点群データを活用して深度シフトとカメラの焦点距離を予測し、復元できるようにします。 3D シーン形状。 2 つのモジュールは別々にトレーニングされるため、厳密にペアになったトレーニングデータは必要ありません。さらに、相対深度アノテーションを使用してトレーニングを改善するために、画像レベルの正規化された回帰損失と法線ベースのジオメトリ損失を提案します。 9 つの目に見えないデータセットで深度モデルをテストし、ゼロショット評価で最先端のパフォーマンスを達成します。コードは https://git.io/Depth で入手できます。

Despite significant progress made in the past few years, challenges remain for depth estimation using a single monocular image. First, it is nontrivial to train a metric-depth prediction model that can generalize well to diverse scenes mainly due to limited training data. Thus, researchers have built large-scale relative depth datasets that are much easier to collect. However, existing relative depth estimation models often fail to recover accurate 3D scene shapes due to the unknown depth shift caused by training with the relative depth data. We tackle this problem here and attempt to estimate accurate scene shapes by training on large-scale relative depth data, and estimating the depth shift. To do so, we propose a two-stage framework that first predicts depth up to an unknown scale and shift from a single monocular image, and then exploits 3D point cloud data to predict the depth shift and the camera's focal length that allow us to recover 3D scene shapes. As the two modules are trained separately, we do not need strictly paired training data. In addition, we propose an image-level normalized regression loss and a normal-based geometry loss to improve training with relative depth annotation. We test our depth model on nine unseen datasets and achieve state-of-the-art performance on zero-shot evaluation. Code is available at: https://git.io/Depth

updated: Sun Aug 28 2022 16:20:14 GMT+0000 (UTC)

published: Sun Aug 28 2022 16:20:14 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト