Global-Local Path Networks for Monocular Depth Estimation with Vertical CutDepth

Doyeon Kim; Woonghyun Ga; Pyungwhan Ahn; Donggyu Joo; Sehwan Chun; Junmo Kim

垂直カットデプスによる単眼深度推定のためのグローバル-ローカルパスネットワーク

単一の画像からの深度推定は、コンピュータビジョンのさまざまな分野に適用できる重要なタスクであり、畳み込みニューラルネットワークの開発とともに急速に成長しています。本論文では、ネットワークの予測精度をさらに向上させるために、単眼深度推定のための新しい構造とトレーニング戦略を提案します。グローバルコンテキストをキャプチャして伝達するために階層型トランスフォーマーエンコーダーを展開し、ローカル接続を考慮しながら推定深度マップを生成するための軽量で強力なデコーダーを設計します。提案された選択的特徴融合モジュールを使用して、マルチスケールローカル特徴とグローバルデコードストリームの間に接続されたパスを構築することにより、ネットワークは両方の表現を統合し、細部を復元できます。さらに、提案されたデコーダは、以前に提案されたデコーダよりも優れたパフォーマンスを示し、計算の複雑さはかなり少なくなります。さらに、深さ推定における重要な観測を利用してモデルを強化することにより、深さ固有の拡張方法を改善します。私たちのネットワークは、挑戦的な深度データセットNYU DepthV2で最先端のパフォーマンスを実現します。提案されたアプローチの有効性を検証および示すために、広範な実験が実施されました。最後に、私たちのモデルは、他の比較モデルよりも優れた一般化能力と堅牢性を示しています。

Depth estimation from a single image is an important task that can be applied to various fields in computer vision, and has grown rapidly with the development of convolutional neural networks. In this paper, we propose a novel structure and training strategy for monocular depth estimation to further improve the prediction accuracy of the network. We deploy a hierarchical transformer encoder to capture and convey the global context, and design a lightweight yet powerful decoder to generate an estimated depth map while considering local connectivity. By constructing connected paths between multi-scale local features and the global decoding stream with our proposed selective feature fusion module, the network can integrate both representations and recover fine details. In addition, the proposed decoder shows better performance than the previously proposed decoders, with considerably less computational complexity. Furthermore, we improve the depth-specific augmentation method by utilizing an important observation in depth estimation to enhance the model. Our network achieves state-of-the-art performance over the challenging depth dataset NYU Depth V2. Extensive experiments have been conducted to validate and show the effectiveness of the proposed approach. Finally, our model shows better generalisation ability and robustness than other comparative models.

updated: Wed Jan 19 2022 06:37:21 GMT+0000 (UTC)

published: Wed Jan 19 2022 06:37:21 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト