Occlusion-aware Unsupervised Learning of Depth from 4-D Light Fields

Jing Jin; Junhui Hou

4-D ライトフィールドからのオクルージョンを意識した教師なし深度の学習

深度の推定は、4 次元ライトフィールドの処理と分析における基本的な問題です。最近の教師あり学習ベースのライトフィールド深度推定方法は、従来の最適化ベースの方法の精度と効率を大幅に改善していますが、これらの方法は、取得が困難または利用できないグラウンドトゥルース深度マップを使用したライトフィールドデータのトレーニングに依存しています現実世界のライトフィールドデータ。さらに、現実世界のデータと合成データの間には避けられないギャップ (またはドメインの違い) があるため、合成データでトレーニングされたモデルを現実世界のデータに一般化すると、パフォーマンスが大幅に低下する可能性があります。対照的に、トレーニング中の監督としてグラウンドトゥルース深度を必要としない、教師なし学習ベースの方法を提案します。具体的には、ライトフィールドデータのユニークなジオメトリ構造の基本的な知識に基づいて、オクルージョン領域の精度を向上させるオクルージョン対応戦略を提示します。マップを作成し、制約付きの教師なし損失を利用して、最終的な深度予測に対応する信頼性を学習します。さらに、テクスチャのない領域を処理するために、重み付きの滑らかさの損失を伴うマルチスケールネットワークを採用しています。合成データの実験結果は、私たちの方法が以前の教師なしの方法と監視された方法の間のパフォーマンスのギャップを大幅に縮小し、明らかに計算コストを削減しながら従来の方法と同等の精度で深度マップを生成できることを示しています。さらに、実世界のデータセットでの実験は、この方法が教師あり方法で発生するドメインシフトの問題を回避できることを示しており、この方法の大きな可能性を示しています。

Depth estimation is a fundamental issue in 4-D light field processing and analysis. Although recent supervised learning-based light field depth estimation methods have significantly improved the accuracy and efficiency of traditional optimization-based ones, these methods rely on the training over light field data with ground-truth depth maps which are challenging to obtain or even unavailable for real-world light field data. Besides, due to the inevitable gap (or domain difference) between real-world and synthetic data, they may suffer from serious performance degradation when generalizing the models trained with synthetic data to real-world data. By contrast, we propose an unsupervised learning-based method, which does not require ground-truth depth as supervision during training. Specifically, based on the basic knowledge of the unique geometry structure of light field data, we present an occlusion-aware strategy to improve the accuracy on occlusion areas, in which we explore the angular coherence among subsets of the light field views to estimate initial depth maps, and utilize a constrained unsupervised loss to learn their corresponding reliability for final depth prediction. Additionally, we adopt a multi-scale network with a weighted smoothness loss to handle the textureless areas. Experimental results on synthetic data show that our method can significantly shrink the performance gap between the previous unsupervised method and supervised ones, and produce depth maps with comparable accuracy to traditional methods with obviously reduced computational cost. Moreover, experiments on real-world datasets show that our method can avoid the domain shift problem presented in supervised methods, demonstrating the great potential of our method.

updated: Sun Jun 06 2021 06:19:50 GMT+0000 (UTC)

published: Sun Jun 06 2021 06:19:50 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト