Gated2Gated: Self-Supervised Depth Estimation from Gated Images

Amanpreet Walia; Stefanie Walz; Mario Bijelic; Fahim Mannan; Frank Julca-Aguilar; Michael Langer; Werner Ritter; Felix Heide

Gated2Gated：ゲート画像からの自己監視深度推定

ゲート付きカメラは、霧、雪、雨の後方散乱に強い高解像度の3D深度でLiDARセンサーをスキャンする代わりの方法として有望です。パルスLiDARセンサーのように、シーンを順次スキャンして光子飛行時間で深度を直接記録する代わりに、ゲート付きイメージャは、メガピクセル解像度でキャプチャされた少数のゲート付きスライスの相対強度で深度をエンコードします。既存の方法では、このような測定値から高解像度の深度をデコードできることが示されていますが、これらの方法では、ゲート深度デコーダーを監視するために同期およびキャリブレーションされたLiDARが必要です。自動車のユースケース以外。この作業では、このギャップを埋め、トレーニング信号としてゲート強度プロファイルと時間的一貫性を使用する完全に自己監視された深度推定方法を提案します。提案されたモデルは、ゲートされたビデオシーケンスからエンドツーエンドでトレーニングされ、LiDARまたはRGBデータを必要とせず、絶対深度値の推定を学習します。ゲートスライスを入力として受け取り、シーンのアルベド、深度、および周囲光の推定を解きほぐします。これらは、周期的な損失を通じて入力スライスを再構築する方法を学習するために使用されます。影と反射のある領域の深さを推定するために、特定のフレームと隣接するゲートスライスの間の時間的一貫性に依存しています。提案されたアプローチが、単眼RGBおよびステレオ画像に基づく既存の教師ありおよび自己教師あり深度推定方法、およびゲート画像に基づく教師あり方法よりも優れていることを実験的に検証します。

Gated cameras hold promise as an alternative to scanning LiDAR sensors with high-resolution 3D depth that is robust to back-scatter in fog, snow, and rain. Instead of sequentially scanning a scene and directly recording depth via the photon time-of-flight, as in pulsed LiDAR sensors, gated imagers encode depth in the relative intensity of a handful of gated slices, captured at megapixel resolution. Although existing methods have shown that it is possible to decode high-resolution depth from such measurements, these methods require synchronized and calibrated LiDAR to supervise the gated depth decoder -- prohibiting fast adoption across geographies, training on large unpaired datasets, and exploring alternative applications outside of automotive use cases. In this work, we fill this gap and propose an entirely self-supervised depth estimation method that uses gated intensity profiles and temporal consistency as a training signal. The proposed model is trained end-to-end from gated video sequences, does not require LiDAR or RGB data, and learns to estimate absolute depth values. We take gated slices as input and disentangle the estimation of the scene albedo, depth, and ambient light, which are then used to learn to reconstruct the input slices through a cyclic loss. We rely on temporal consistency between a given frame and neighboring gated slices to estimate depth in regions with shadows and reflections. We experimentally validate that the proposed approach outperforms existing supervised and self-supervised depth estimation methods based on monocular RGB and stereo images, as well as supervised methods based on gated images.

updated: Sat Dec 04 2021 19:47:38 GMT+0000 (UTC)

published: Sat Dec 04 2021 19:47:38 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト