Dilated Fully Convolutional Neural Network for Depth Estimation from a Single Image

Binghan Li; Yindong Hua; Yifeng Liu; Mi Lu

単一画像からの深さ推定のための拡張された完全畳み込みニューラルネットワーク

深度予測は、3Dシーンを理解する上で重要な役割を果たします。何年にもわたっていくつかの手法が開発されてきましたが、その中で畳み込みニューラルネットワークは最近、単一の画像から深度を推定するという最先端のパフォーマンスを実現しました。ただし、従来のCNNは、プーリング層によって引き起こされる解像度の低下と情報の損失に悩まされています。また、完全に接続されたレイヤーから生成された特大のパラメーターは、多くの場合、爆発的なメモリ使用量の問題につながります。この論文では、欠陥に対処するための高度な拡張完全畳み込みニューラルネットワークを紹介します。拡張畳み込みにおける受容野の指数関数的拡大を利用して、私たちのモデルは解像度の損失を最小限に抑えることができます。また、完全に接続された層を完全に畳み込む層に置き換えることにより、パラメーターの量を大幅に削減します。 NYU Depth V2データセットで実験的に、モデルから得られた深度予測が、従来のCNN手法から得られたものよりもグラウンドトゥルースにかなり近いことを示します。

Depth prediction plays a key role in understanding a 3D scene. Several techniques have been developed throughout the years, among which Convolutional Neural Network has recently achieved state-of-the-art performance on estimating depth from a single image. However, traditional CNNs suffer from the lower resolution and information loss caused by the pooling layers. And oversized parameters generated from fully connected layers often lead to a exploded memory usage problem. In this paper, we present an advanced Dilated Fully Convolutional Neural Network to address the deficiencies. Taking advantages of the exponential expansion of the receptive field in dilated convolutions, our model can minimize the loss of resolution. It also reduces the amount of parameters significantly by replacing the fully connected layers with the fully convolutional layers. We show experimentally on NYU Depth V2 datasets that the depth prediction obtained from our model is considerably closer to ground truth than that from traditional CNNs techniques.

updated: Fri Mar 12 2021 23:19:32 GMT+0000 (UTC)

published: Fri Mar 12 2021 23:19:32 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト