Lightweight integration of 3D features to improve 2D image segmentation

Olivier Pradelle; Raphaelle Chaine; David Wendland; Julie Digne

2D 画像のセグメンテーションを改善するための 3D 機能の軽量統合

データ収集システムがさまざまなモダリティ (点群、深度、RGB など) のデータをますます多く提供するようになったため、シーンの理解は過去数年間で大幅に進歩しました。ただし、この改善には、計算リソースとデータ注釈の要件に関して多大なコストがかかります。幾何学的情報と画像を組み合わせて分析するために、多くのアプローチは 2D 損失と 3D 損失の両方に依存しており、ピクセルラベルごとの 2D だけでなく、ポイントごとの 3D ラベルも必要とします。ただし、3D グラウンドトゥルースを取得するのは困難で、時間がかかり、エラーが発生しやすくなります。この論文では、2D セグメンテーション損失のみを使用して、幾何学的特徴抽出と 2D セグメンテーションネットワークをエンドツーエンド方式で共同でトレーニングすることにより、3D グラウンドトゥルースを必要とせずに画像セグメンテーションが 3D 幾何学情報から恩恵を受けることができることを示します。私たちの方法は、軽量 3D ニューラルネットワークを使用して、提供された点群から 3D フィーチャのマップを直接抽出することから始まります。 RGB 画像と結合された 3D 特徴マップは、古典的な画像セグメンテーションネットワークへの入力として使用されます。私たちの方法は多くの 2D セグメンテーションネットワークに適用でき、3D グラウンドトゥルースが必要ないため、ネットワークの重みをわずかに増加させ、入力データセットの要件を軽くするだけで、パフォーマンスを大幅に向上させることができます。

Scene understanding has made tremendous progress over the past few years, as data acquisition systems are now providing an increasing amount of data of various modalities (point cloud, depth, RGB...). However, this improvement comes at a large cost on computation resources and data annotation requirements. To analyze geometric information and images jointly, many approaches rely on both a 2D loss and 3D loss, requiring not only 2D per pixel-labels but also 3D per-point labels. However, obtaining a 3D groundtruth is challenging, time-consuming and error-prone. In this paper, we show that image segmentation can benefit from 3D geometric information without requiring a 3D groundtruth, by training the geometric feature extraction and the 2D segmentation network jointly, in an end-to-end fashion, using only the 2D segmentation loss. Our method starts by extracting a map of 3D features directly from a provided point cloud by using a lightweight 3D neural network. The 3D feature map, merged with the RGB image, is then used as an input to a classical image segmentation network. Our method can be applied to many 2D segmentation networks, improving significantly their performance with only a marginal network weight increase and light input dataset requirements, since no 3D groundtruth is required.

updated: Mon Jul 10 2023 08:38:08 GMT+0000 (UTC)

published: Fri Dec 16 2022 08:22:55 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト