HR-Depth: High Resolution Self-Supervised Monocular Depth Estimation

Xiaoyang Lyu; Liang Liu; Mengmeng Wang; Xin Kong; Lina Liu; Yong Liu; Xinxin Chen; Yi Yuan

HR-Depth：高解像度の自己監視単眼深度推定

自己教師あり学習は、画像シーケンスを唯一の教師ありソースとして使用して、単眼深度推定に大きな可能性を示します。深度推定には高解像度の画像を使おうとしていますが、予測の精度は大幅に向上していません。この作業では、主要な理由は、大きな勾配領域での不正確な深度推定に起因し、解像度が上がるにつれて双一次内挿エラーが徐々に消えていくことがわかります。大きな勾配領域でより正確な深度推定を取得するには、空間情報と意味情報を使用して高解像度の特徴を取得する必要があります。したがって、2つの効果的な戦略で改善されたDepthNet、HR-Depthを提示します：（1）より良い高解像度機能を取得するためにDepthNetのスキップ接続を再設計し、（2）機能融合スクイーズアンドエキサイト（fSE）を提案しますエンコーダーとしてResnet-18を使用することで、HR-Depthは、高解像度と低解像度の両方で最小のパラメーターで、以前のすべての最先端（SoTA）メソッドを上回ります。さらに、以前の最先端の方法は、実際のアプリケーションを制限する大量のパラメータを持つかなり複雑で深いネットワークに基づいています。したがって、MobileNetV3をエンコーダーとして使用する軽量ネットワークも構築します。実験によると、軽量ネットワークは、Monodepth2のような多くの大規模モデルと同等に、わずか20％のパラメーターで高解像度で実行できます。すべてのコードとモデルはhttps://github.com/shawLyu/HR-Depthで入手できます。

Self-supervised learning shows great potential in monoculardepth estimation, using image sequences as the only source ofsupervision. Although people try to use the high-resolutionimage for depth estimation, the accuracy of prediction hasnot been significantly improved. In this work, we find thecore reason comes from the inaccurate depth estimation inlarge gradient regions, making the bilinear interpolation er-ror gradually disappear as the resolution increases. To obtainmore accurate depth estimation in large gradient regions, itis necessary to obtain high-resolution features with spatialand semantic information. Therefore, we present an improvedDepthNet, HR-Depth, with two effective strategies: (1) re-design the skip-connection in DepthNet to get better high-resolution features and (2) propose feature fusion Squeeze-and-Excitation(fSE) module to fuse feature more efficiently.Using Resnet-18 as the encoder, HR-Depth surpasses all pre-vious state-of-the-art(SoTA) methods with the least param-eters at both high and low resolution. Moreover, previousstate-of-the-art methods are based on fairly complex and deepnetworks with a mass of parameters which limits their realapplications. Thus we also construct a lightweight networkwhich uses MobileNetV3 as encoder. Experiments show thatthe lightweight network can perform on par with many largemodels like Monodepth2 at high-resolution with only20%parameters. All codes and models will be available at https://github.com/shawLyu/HR-Depth.

updated: Mon Dec 14 2020 09:15:15 GMT+0000 (UTC)

published: Mon Dec 14 2020 09:15:15 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト