Toward Practical Monocular Indoor Depth Estimation

Cho-Ying Wu; Jialiang Wang; Michael Hall; Ulrich Neumann; Shuochen Su

実用的な単眼室内深度推定に向けて

グラウンドトゥルース深度ガイダンスのない以前の単眼深度推定方法の大部分は、運転シナリオに焦点を合わせています。このような方法は、オブジェクトが乱雑になり、近接場に任意に配置される、目に見えない複雑な屋内シーンに一般化されないことを示します。より堅牢にするために、構造化されているがメトリックにとらわれない深さを生成する既成の相対深さ推定器からナックを学習する構造蒸留アプローチを提案します。構造蒸留と、左右の一貫性からメトリックを学習するブランチを組み合わせることで、一般的な屋内シーンの構造化されたメトリックの深さを実現し、リアルタイムで推論を行います。学習と評価を容易にするために、数千の環境でのシミュレーションからのデータセットであるSimSINと、一般的な屋内環境の約500の実際のスキャンシーケンスを含むデータセットであるUniSINを収集します。 sim-to-realとreal-to-realの両方の設定で実験し、改善を示します。また、深度マップを使用したダウンストリームアプリケーションでも同様です。この作業は、メソッド、データ、およびアプリケーションの側面をカバーする完全な研究を提供します。

The majority of prior monocular depth estimation methods without groundtruth depth guidance focus on driving scenarios. We show that such methods generalize poorly to unseen complex indoor scenes, where objects are cluttered and arbitrarily arranged in the near field. To obtain more robustness, we propose a structure distillation approach to learn knacks from an off-the-shelf relative depth estimator that produces structured but metric-agnostic depth. By combining structure distillation with a branch that learns metrics from left-right consistency, we attain structured and metric depth for generic indoor scenes and make inferences in real-time. To facilitate learning and evaluation, we collect SimSIN, a dataset from simulation with thousands of environments, and UniSIN, a dataset that contains about 500 real scan sequences of generic indoor environments. We experiment in both sim-to-real and real-to-real settings, and show improvements, as well as in downstream applications using our depth maps. This work provides a full study, covering methods, data, and applications aspects.

updated: Mon Mar 28 2022 22:03:03 GMT+0000 (UTC)

published: Sat Dec 04 2021 11:02:56 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト