M4Depth: A motion-based approach for monocular depth estimation on video sequences

Michaël Fonder; Damien Ernst; Marc Van Droogenbroeck

M4Depth：ビデオシーケンスの単眼深度推定のためのモーションベースのアプローチ

自動運転車にとって、物体までの距離を取得することは非常に重要です。深度センサーを使用できない場合、この距離はRGBカメラから推定する必要があります。車とは対照的に、機内に搭載されたカメラから深度を推定するタスクは、飛行中の動きに制約がないため、ドローンでは複雑になります。％ドローンの場合、カメラの動きに制約がないため、このタスクは車載カメラよりもさらに複雑です。本稿では、RGBビデオストリームとドローンの動き情報を用いて、搭載カメラから見た物体の距離を推定する方法を紹介します。私たちの方法は、ピラミッド型畳み込みニューラルネットワークアーキテクチャに基づいて構築されており、時間の繰り返しをモーションによって課せられる幾何学的制約と組み合わせて使用して、ピクセル単位の深度マップを生成します。ドローンに接続されたカメラのRGBビデオストリームから％私たちのアーキテクチャでは、ピラミッドの各レベルは、ピラミッドの前のレベルによって提供された過去の観測と情報に基づいて、独自の深度推定値を生成するように設計されています。レベル間のデータの時空間的一貫性を維持するために、空間再投影レイヤーを導入します。さまざまな非構造化屋外環境で記録された合成ドローンの軌道を特徴とするパブリックドローンデータセットであるMid-Airでのアプローチのパフォーマンスを分析します。私たちの実験は、私たちのネットワークが最先端の深度推定方法よりも優れていること、そして動き情報の使用がこの改善の主な要因であることを示しています。私たちのメソッドのコードはGitHubで公開されています。 https://github.com/michael-fonder/M4Depthhttps://github.com/michael-fonder/M4Depthを参照してください

Getting the distance to objects is crucial for autonomous vehicles. In instances where depth sensors cannot be used, this distance has to be estimated from RGB cameras. As opposed to cars, the task of estimating depth from on-board mounted cameras is made complex on drones because of the lack of constrains on motion during flights. %In the case of drones, this task is even more complex than for car-mounted cameras since the camera motion is unconstrained. In this paper, we present a method to estimate the distance of objects seen by an on-board mounted camera by using its RGB video stream and drone motion information. Our method is built upon a pyramidal convolutional neural network architecture and uses time recurrence in pair with geometric constraints imposed by motion to produce pixel-wise depth maps. %from a RGB video stream of a camera attached to the drone In our architecture, each level of the pyramid is designed to produce its own depth estimate based on past observations and information provided by the previous level in the pyramid. We introduce a spatial reprojection layer to maintain the spatio-temporal consistency of the data between the levels. We analyse the performance of our approach on Mid-Air, a public drone dataset featuring synthetic drone trajectories recorded in a wide variety of unstructured outdoor environments. Our experiments show that our network outperforms state-of-the-art depth estimation methods and that the use of motion information is the main contributing factor for this improvement. The code of our method is publicly available on GitHub; see https://github.com/michael-fonder/M4Depthhttps://github.com/michael-fonder/M4Depth

updated: Thu May 20 2021 15:46:02 GMT+0000 (UTC)

published: Thu May 20 2021 15:46:02 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト