Self-Supervised Monocular DepthEstimation with Internal Feature Fusion

Hang Zhou; David Greenwood; Sarah Taylor

内部特徴融合による自己監視単眼深度推定

深度推定のための自己教師あり学習は、監視のために画像シーケンスのジオメトリを使用し、有望な結果を示します。多くのコンピュータビジョンタスクと同様に、深度ネットワークのパフォーマンスは、画像から正確な空間的および意味的表現を学習する機能によって決定されます。したがって、深度推定のためにセマンティックセグメンテーションネットワークを利用するのは自然なことです。この作業では、十分に開発されたセマンティックセグメンテーションネットワークHRNetに基づいて、ダウンサンプリングおよびアップサンプリング手順でセマンティック情報を利用できる新しい深度推定ネットワークDIFFNetを提案します。特徴融合と注意メカニズムを適用することにより、提案された方法は、KITTIベンチマークでの最先端の単眼深度推定方法よりも優れています。私たちの方法はまた、より高解像度のトレーニングデータでより大きな可能性を示しています。標準ベンチマークから経験的に導き出された、挑戦的なケースのテストセットを確立することにより、追加の拡張評価戦略を提案します。

Self-supervised learning for depth estimation uses geometry in image sequences for supervision and shows promising results. Like many computer vision tasks, depth network performance is determined by the capability to learn accurate spatial and semantic representations from images. Therefore, it is natural to exploit semantic segmentation networks for depth estimation. In this work, based on a well-developed semantic segmentation network HRNet, we propose a novel depth estimation networkDIFFNet, which can make use of semantic information in down and upsampling procedures. By applying feature fusion and an attention mechanism, our proposed method outperforms the state-of-the-art monocular depth estimation methods on the KITTI benchmark. Our method also demonstrates greater potential on higher resolution training data. We propose an additional extended evaluation strategy by establishing a test set of challenging cases, empirically derived from the standard benchmark.

updated: Mon Oct 18 2021 17:31:11 GMT+0000 (UTC)

published: Mon Oct 18 2021 17:31:11 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト