IMENet: Joint 3D Semantic Scene Completion and 2D Semantic Segmentation through Iterative Mutual Enhancement

Jie Li; Laiyan Ding; Rui Huang

IMENet：反復相互拡張による3Dセマンティックシーンの共同完了と2Dセマンティックセグメンテーション

3Dセマンティックシーンの完了と2Dセマンティックセグメンテーションは、正に相関する高レベルの機能を使用して同じセマンティッククラスを予測するため、どちらも屋内シーンの理解に不可欠な2つの密接に相関するタスクです。現在の方法では、2Dセグメンテーションのために初期融合RGB-D画像から抽出された2D特徴を使用して、3Dシーンの完成度を向上させます。このシーケンシャルスキームは、これら2つのタスクが互いに完全に利益をもたらすことを保証するものではないと主張し、反復相互拡張ネットワーク（IMENet）を提示して、それらを共同で解決します。具体的には、2つの改良モジュールが2つのタスクのための統一されたフレームワークの下で開発されています。 1つ目は、2D Deformable Context Pyramid（DCP）モジュールです。このモジュールは、現在の3D予測から投影を受け取り、2D予測を改良します。次に、3D変形可能深度注意（DDA）モジュールが提案され、2D予測からの再投影された結果を活用して、粗い3D予測を更新します。この反復的な融合は、後の段階で両方のタスクの安定した高レベルの機能に発生します。 NYUおよびNYUCADデータセットでの広範な実験により、提案された反復後期融合スキームの有効性が検証され、私たちのアプローチは、3Dセマンティックシーンの完了と2Dセマンティックセグメンテーションの両方で最先端を上回ります。

3D semantic scene completion and 2D semantic segmentation are two tightly correlated tasks that are both essential for indoor scene understanding, because they predict the same semantic classes, using positively correlated high-level features. Current methods use 2D features extracted from early-fused RGB-D images for 2D segmentation to improve 3D scene completion. We argue that this sequential scheme does not ensure these two tasks fully benefit each other, and present an Iterative Mutual Enhancement Network (IMENet) to solve them jointly, which interactively refines the two tasks at the late prediction stage. Specifically, two refinement modules are developed under a unified framework for the two tasks. The first is a 2D Deformable Context Pyramid (DCP) module, which receives the projection from the current 3D predictions to refine the 2D predictions. In turn, a 3D Deformable Depth Attention (DDA) module is proposed to leverage the reprojected results from 2D predictions to update the coarse 3D predictions. This iterative fusion happens to the stable high-level features of both tasks at a late stage. Extensive experiments on NYU and NYUCAD datasets verify the effectiveness of the proposed iterative late fusion scheme, and our approach outperforms the state of the art on both 3D semantic scene completion and 2D semantic segmentation.

updated: Tue Jun 29 2021 13:34:20 GMT+0000 (UTC)

published: Tue Jun 29 2021 13:34:20 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト