NeuralRecon: Real-Time Coherent 3D Reconstruction from Monocular Video

Jiaming Sun; Yiming Xie; Linghao Chen; Xiaowei Zhou; Hujun Bao

NeuralRecon：単眼ビデオからのリアルタイムコヒーレント3D再構成

単眼ビデオからのリアルタイム3Dシーン再構成のためのNeuralReconという名前の新しいフレームワークを提示します。各キーフレームで個別にシングルビュー深度マップを推定し、後でそれらを融合する以前の方法とは異なり、ニューラルネットワークによって各ビデオフラグメントのスパースTSDFボリュームとして表されるローカルサーフェスを直接再構築することを提案します。ゲート付き回帰ユニットに基づく学習ベースのTSDF融合モジュールは、以前のフラグメントからの機能を融合するようにネットワークをガイドするために使用されます。この設計により、ネットワークは、サーフェスを順次再構成するときに、3Dサーフェスのローカルな滑らかさの事前確率とグローバルな形状の事前確率をキャプチャできるため、正確でコヒーレントなリアルタイムのサーフェス再構成が実現します。 ScanNetおよび7-Scenesデータセットでの実験は、私たちのシステムが精度と速度の両方の点で最先端の方法よりも優れていることを示しています。私たちの知る限り、これは、高密度のコヒーレント3Dジオメトリをリアルタイムで再構築できる最初の学習ベースのシステムです。

We present a novel framework named NeuralRecon for real-time 3D scene reconstruction from a monocular video. Unlike previous methods that estimate single-view depth maps separately on each key-frame and fuse them later, we propose to directly reconstruct local surfaces represented as sparse TSDF volumes for each video fragment sequentially by a neural network. A learning-based TSDF fusion module based on gated recurrent units is used to guide the network to fuse features from previous fragments. This design allows the network to capture local smoothness prior and global shape prior of 3D surfaces when sequentially reconstructing the surfaces, resulting in accurate, coherent, and real-time surface reconstruction. The experiments on ScanNet and 7-Scenes datasets show that our system outperforms state-of-the-art methods in terms of both accuracy and speed. To the best of our knowledge, this is the first learning-based system that is able to reconstruct dense coherent 3D geometry in real-time.

updated: Thu Apr 01 2021 17:59:46 GMT+0000 (UTC)

published: Thu Apr 01 2021 17:59:46 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト