UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models

Varun Ravi Kumar; Senthil Yogamani; Markus Bach; Christian Witt; Stefan Milz; Patrick Mader

UnRectDepthNet：一般的なカメラの歪みモデルを処理するための一般的なフレームワークを使用した、自己監視単眼深度推定

古典的なコンピュータービジョンでは、修正はマルチビュー深度推定の不可欠な部分です。通常、エピポーラ整流とレンズ歪み補正が含まれます。このプロセスは深度推定を大幅に簡略化するため、CNNアプローチで採用されています。ただし、整流には、視野（FOV）の縮小、歪みのリサンプリング、キャリブレーションエラーに対する感度など、いくつかの副作用があります。この効果は、歪みが大きい場合に特に顕著です（広角魚眼カメラなど）。この論文では、修正されていない単眼式ビデオから深度、ユークリッド距離、および視覚オドメトリを推定するための、一般的なスケール認識自己監視パイプラインを提案します。修正されたKITTIデータセットに匹敵するバレルの歪みがある未修正のKITTIデータセットで同様のレベルの精度を示します。直感は、修正ステップがCNNモデル内で暗黙的に吸収され、複雑さを増すことなく歪みモデルを学習できるということです。私たちのアプローチは視野の減少に悩まされることはなく、推論時の修正のための計算コストを回避します。提案されたフレームワークの一般的な適用性をさらに説明するために、それを190 ^∘の水平視野を持つ広角魚眼カメラに適用します。トレーニングフレームワークUnRectDepthNetは、カメラの歪みモデルを引数として取り、それに応じて投影関数と非投影関数を適合させます。提案されたアルゴリズムは、KITTIで修正されたデータセットでさらに評価され、以前の作業であるFisheyeDistanceNetを改善する最先端の結果が得られます。歪んだテストシーンのビデオシーケンスに関する定性的な結果は、優れたパフォーマンスhttps://youtu.be/K6pbx3bU4Ssを示しています。

In classical computer vision, rectification is an integral part of multi-view depth estimation. It typically includes epipolar rectification and lens distortion correction. This process simplifies the depth estimation significantly, and thus it has been adopted in CNN approaches. However, rectification has several side effects, including a reduced field of view (FOV), resampling distortion, and sensitivity to calibration errors. The effects are particularly pronounced in case of significant distortion (e.g., wide-angle fisheye cameras). In this paper, we propose a generic scale-aware self-supervised pipeline for estimating depth, euclidean distance, and visual odometry from unrectified monocular videos. We demonstrate a similar level of precision on the unrectified KITTI dataset with barrel distortion comparable to the rectified KITTI dataset. The intuition being that the rectification step can be implicitly absorbed within the CNN model, which learns the distortion model without increasing complexity. Our approach does not suffer from a reduced field of view and avoids computational costs for rectification at inference time. To further illustrate the general applicability of the proposed framework, we apply it to wide-angle fisheye cameras with 190^∘ horizontal field of view. The training framework UnRectDepthNet takes in the camera distortion model as an argument and adapts projection and unprojection functions accordingly. The proposed algorithm is evaluated further on the KITTI rectified dataset, and we achieve state-of-the-art results that improve upon our previous work FisheyeDistanceNet. Qualitative results on a distorted test scene video sequence indicate excellent performance https://youtu.be/K6pbx3bU4Ss.

updated: Tue Jun 06 2023 14:26:28 GMT+0000 (UTC)

published: Mon Jul 13 2020 20:35:05 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト