Semantic Scene Completion via Integrating Instances and Scene in-the-Loop

Yingjie Cai; Xuesong Chen; Chao Zhang; Kwan-Yee Lin; Xiaogang Wang; Hongsheng Li

インスタンスとシーンインザループの統合によるセマンティックシーンの完了

セマンティックシーンコンプリーションは、シングルビュー深度またはRGBD画像から正確なボクセル単位のセマンティクスを使用して完全な3Dシーンを再構築することを目的としています。これは、屋内シーンを理解する上で重要ですが、困難な問題です。この作業では、インスタンスレベルとシーンレベルの両方のセマンティック情報を利用する、Scene-Instance-Scene Network（SISNet）という名前の新しいフレームワークを紹介します。私たちの方法は、セマンティックカテゴリが簡単に混同される近くのオブジェクトだけでなく、きめ細かい形状の詳細を推測することができます。重要な洞察は、インスタンスとシーン全体の再構築をガイドするために、生の入力画像ではなく、粗く完成したセマンティックシーンからインスタンスを分離することです。 SISNetは、シーンからインスタンス（SI）およびインスタンスからシーン（IS）のセマンティック補完を繰り返し実行します。具体的には、SIはオブジェクトの周囲のコンテキストをエンコードして、インスタンスをシーンから効果的に分離することができ、各インスタンスをより高い解像度にボクセル化して、より細かい詳細をキャプチャすることができます。 ISを使用すると、きめ細かいインスタンス情報を3Dシーンに統合して戻すことができるため、より正確なセマンティックシーンの完成につながります。このような反復メカニズムを利用して、シーンとインスタンスの完了は互いに利益をもたらし、より高い完了精度を実現します。広範囲にわたる実験は、提案された方法が、実際のNYU、NYUCAD、および合成SUNCG-RGBDデータセットの両方で最先端の方法を一貫して上回っていることを示しています。コードと補足資料はhttps://github.com/yjcaimeow/SISNetで入手できます。

Semantic Scene Completion aims at reconstructing a complete 3D scene with precise voxel-wise semantics from a single-view depth or RGBD image. It is a crucial but challenging problem for indoor scene understanding. In this work, we present a novel framework named Scene-Instance-Scene Network (SISNet), which takes advantages of both instance and scene level semantic information. Our method is capable of inferring fine-grained shape details as well as nearby objects whose semantic categories are easily mixed-up. The key insight is that we decouple the instances from a coarsely completed semantic scene instead of a raw input image to guide the reconstruction of instances and the overall scene. SISNet conducts iterative scene-to-instance (SI) and instance-to-scene (IS) semantic completion. Specifically, the SI is able to encode objects' surrounding context for effectively decoupling instances from the scene and each instance could be voxelized into higher resolution to capture finer details. With IS, fine-grained instance information can be integrated back into the 3D scene and thus leads to more accurate semantic scene completion. Utilizing such an iterative mechanism, the scene and instance completion benefits each other to achieve higher completion accuracy. Extensively experiments show that our proposed method consistently outperforms state-of-the-art methods on both real NYU, NYUCAD and synthetic SUNCG-RGBD datasets. The code and the supplementary material will be available at https://github.com/yjcaimeow/SISNet.

updated: Sun Jun 06 2021 13:53:01 GMT+0000 (UTC)

published: Thu Apr 08 2021 09:50:30 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト