A Unified Learning Based Framework for Light Field Reconstruction from   Coded Projections

Anil Kumar Vadathya; Sharath Girish; Kaushik Mitra

コード化投影からのライトフィールド再構築のための統合学習ベースのフレームワーク

A Unified Learning Based Framework for Light Field Reconstruction from Coded Projections

ライトフィールドは、視覚信号の時空間角度をキャプチャすることにより、3Dの世界を表現する豊かな方法を提供します。ただし、プレノプティックカメラを介してライトフィールド（LF）をキャプチャする一般的な方法では、空間と角度の解像度のトレードオフが発生します。圧縮ライトフィールドやプログラム可能なコード化アパーチャなどの計算イメージング技術は、入ってくる空間角ライトフィールドを多重化することによって得られたコード化投影から完全なセンサー解像度LFを再構築します。ここでは、最小数のコード化画像を入力としてさまざまな多重化スキームからLFを再構築できる統合学習フレームワークを示します。 3つのライトフィールドキャプチャスキームを検討します：センサーの近くにコードを配置したヘテロダインキャプチャスキーム、カメラの開口部にコードを使用したコード化アパーチャスキーム、最後に明示的なコーディングのないフォーカスデフォーカスペアをキャプチャする二重露光スキーム。アルゴリズムは3段階で構成されます1）符号化画像から全焦点画像を復元します2）符号化画像と全焦点画像からすべてのLFビューの視差マップを推定します3）その後視差マップを使用して全焦点画像をワープしてLFをレンダリングし、それを調整します。これらの3つの段階では、ViewNet、DispairtyNet、RefineNetの3つのディープニューラルネットワークを提案します。再構成により、学習アルゴリズムが3つの多重化スキームすべてに対して最新の結果を達成していることがわかります。特に、フォーカスとデフォーカスのペアからのLF再構築は、複数の画像からの他の学習ベースのビュー合成アプローチに匹敵します。したがって、私たちの仕事は、DSLRなどの従来のカメラを使用して高解像度のLF（〜メガピクセル）をキャプチャする方法を提供します。再構成されたライトフィールドをよりよく理解するには、補足資料$ \ href {https://docs.google.com/presentation/d/1Vr-F8ZskrSd63tvnLfJ2xmEXY6OBc1Rll3XeOAtc11I/ {online $を確認してください。

Light field presents a rich way to represent the 3D world by capturing the spatio-angular dimensions of the visual signal. However, the popular way of capturing light field (LF) via a plenoptic camera presents spatio-angular resolution trade-off. Computational imaging techniques such as compressive light field and programmable coded aperture reconstruct full sensor resolution LF from coded projections obtained by multiplexing the incoming spatio-angular light field. Here, we present a unified learning framework that can reconstruct LF from a variety of multiplexing schemes with minimal number of coded images as input. We consider three light field capture schemes: heterodyne capture scheme with code placed near the sensor, coded aperture scheme with code at the camera aperture and finally the dual exposure scheme of capturing a focus-defocus pair where there is no explicit coding. Our algorithm consists of three stages 1) we recover the all-in-focus image from the coded image 2) we estimate the disparity maps for all the LF views from the coded image and the all-in-focus image, 3) we then render the LF by warping the all-in-focus image using disparity maps and refine it. For these three stages we propose three deep neural networks - ViewNet, DispairtyNet and RefineNet. Our reconstructions show that our learning algorithm achieves state-of-the-art results for all the three multiplexing schemes. Especially, our LF reconstructions from focus-defocus pair is comparable to other learning-based view synthesis approaches from multiple images. Thus, our work paves the way for capturing high-resolution LF (~ a megapixel) using conventional cameras such as DSLRs. Please check our supplementary materials $\href{https://docs.google.com/presentation/d/1Vr-F8ZskrSd63tvnLfJ2xmEXY6OBc1Rll3XeOAtc11I/{online$ to better appreciate the reconstructed light fields.

updated: Fri Oct 18 2019 21:35:45 GMT+0000 (UTC)

published: Wed Dec 26 2018 20:57:43 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト