MonoScene: Monocular 3D Semantic Scene Completion

Anh-Quan Cao; Raoul de Charette

MonoScene：単眼3Dセマンティックシーンの完成

MonoSceneは、3Dセマンティックシーンコンプリーション（SSC）フレームワークを提案します。このフレームワークでは、シーンの高密度のジオメトリとセマンティクスが単一の単眼RGB画像から推測されます。 SSCの文献とは異なり、2.5または3D入力に依存して、セマンティクスを共同で推測しながら、2Dから3Dへのシーン再構成の複雑な問題を解決します。私たちのフレームワークは、光学からインスピレーションを得た新しい2D-3D機能の投影によってブリッジされた連続する2Dおよび3D UNetに依存し、空間セマンティックの一貫性を強制する前に3Dコンテキスト関係を導入します。建築の貢献に加えて、私たちは斬新なグローバルシーンとローカル錐台の損失を紹介します。実験によると、カメラの視野を超えてももっともらしい風景を幻覚化しながら、すべての指標とデータセットに関する文献を上回っています。コードとトレーニング済みモデルは、https：//github.com/cv-rits/MonoSceneで入手できます。

MonoScene proposes a 3D Semantic Scene Completion (SSC) framework, where the dense geometry and semantics of a scene are inferred from a single monocular RGB image. Different from the SSC literature, relying on 2.5 or 3D input, we solve the complex problem of 2D to 3D scene reconstruction while jointly inferring its semantics. Our framework relies on successive 2D and 3D UNets bridged by a novel 2D-3D features projection inspiring from optics and introduces a 3D context relation prior to enforce spatio-semantic consistency. Along with architectural contributions, we introduce novel global scene and local frustums losses. Experiments show we outperform the literature on all metrics and datasets while hallucinating plausible scenery even beyond the camera field of view. Our code and trained models are available at https://github.com/cv-rits/MonoScene

updated: Wed Dec 01 2021 18:59:57 GMT+0000 (UTC)

published: Wed Dec 01 2021 18:59:57 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト