MEStereo-Du2CNN: A Novel Dual Channel CNN for Learning Robust Depth Estimates from Multi-exposure Stereo Images for HDR 3D Applications

Rohit Choudhary; Mansi Sharma; Uma T V; Rithvik Anil

MEStereo-Du2CNN：HDR3Dアプリケーションの多重露光ステレオ画像からロバストな深度推定を学習するための新しいデュアルチャネルCNN

ディスプレイ技術は何年にもわたって進化してきました。 3Dテクノロジーを次のレベルに引き上げるには、実用的なHDRキャプチャ、処理、および表示ソリューションを開発することが重要です。多重露光ステレオ画像シーケンスの深度推定は、費用効果の高い3DHDRビデオコンテンツの開発に不可欠なタスクです。この論文では、多重露光ステレオ深度推定のための新しいディープアーキテクチャを開発します。提案されたアーキテクチャには、2つの新しいコンポーネントがあります。まず、従来のステレオ深度推定で使用されていたステレオマッチング手法が刷新されました。私たちのアーキテクチャのステレオ深度推定コンポーネントでは、モノラルからステレオへの転送学習アプローチが展開されています。提案された定式化は、コストボリューム構築要件を回避します。これは、機能融合のために異なる重みを持つResNetベースのデュアルエンコーダーシングルデコーダーCNNに置き換えられます。 EfficientNetベースのブロックは、視差を学習するために使用されます。次に、堅牢な視差機能融合アプローチを使用して、さまざまな露出レベルでステレオ画像から取得した視差マップを組み合わせます。さまざまな露出で取得された視差マップは、さまざまな品質測定値に対して計算されたウェイトマップを使用してマージされます。得られた最終的な予測視差マップは、より堅牢で、深度の不連続性を維持する最良の機能を保持しています。提案されたCNNは、標準ダイナミックレンジステレオデータまたは多重露光低ダイナミックレンジステレオシーケンスを使用してトレーニングする柔軟性を提供します。パフォーマンスの点では、提案されたモデルは、挑戦的なシーンフローと異なる露出のミドルベリーステレオデータセットで、定量的および定性的に、最先端の単眼およびステレオ深度推定方法を上回ります。このアーキテクチャは、複雑な自然のシーンで非常に優れたパフォーマンスを発揮し、さまざまな3DHDRアプリケーションでの有用性を示しています。

Display technologies have evolved over the years. It is critical to develop practical HDR capturing, processing, and display solutions to bring 3D technologies to the next level. Depth estimation of multi-exposure stereo image sequences is an essential task in the development of cost-effective 3D HDR video content. In this paper, we develop a novel deep architecture for multi-exposure stereo depth estimation. The proposed architecture has two novel components. First, the stereo matching technique used in traditional stereo depth estimation is revamped. For the stereo depth estimation component of our architecture, a mono-to-stereo transfer learning approach is deployed. The proposed formulation circumvents the cost volume construction requirement, which is replaced by a ResNet based dual-encoder single-decoder CNN with different weights for feature fusion. EfficientNet based blocks are used to learn the disparity. Secondly, we combine disparity maps obtained from the stereo images at different exposure levels using a robust disparity feature fusion approach. The disparity maps obtained at different exposures are merged using weight maps calculated for different quality measures. The final predicted disparity map obtained is more robust and retains best features that preserve the depth discontinuities. The proposed CNN offers flexibility to train using standard dynamic range stereo data or with multi-exposure low dynamic range stereo sequences. In terms of performance, the proposed model surpasses state-of-the-art monocular and stereo depth estimation methods, both quantitatively and qualitatively, on challenging Scene flow and differently exposed Middlebury stereo datasets. The architecture performs exceedingly well on complex natural scenes, demonstrating its usefulness for diverse 3D HDR applications.

updated: Tue Jun 21 2022 13:23:22 GMT+0000 (UTC)

published: Tue Jun 21 2022 13:23:22 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト