Learning 3D Semantic Segmentation with only 2D Image Supervision

Kyle Genova; Xiaoqi Yin; Abhijit Kundu; Caroline Pantofaru; Forrester Cole; Avneesh Sud; Brian Brewington; Brian Shucker; Thomas Funkhouser

2D画像監視のみで3Dセマンティックセグメンテーションを学習する

最近の都市地図作成と自動運転の取り組みの成長に伴い、LIDARスキャナーとカラーカメラを備えた地上プラットフォームから収集された生の3Dデータが爆発的に増加しています。ただし、ラベリングコストが高いため、グラウンドトゥルース3Dセマンティックセグメンテーションアノテーションは、量と地理的多様性の両方で制限されており、センサー間で転送することも困難です。対照的に、グラウンドトゥルースセマンティックセグメンテーションを備えた大規模な画像コレクションは、さまざまなシーンのセットですぐに利用できます。このホワイトペーパーでは、ラベル付けされた2D画像コレクションのみを使用して3Dセマンティックセグメンテーションモデルのトレーニングを監視する方法を調査します。私たちのアプローチは、マルチビューフュージョンを使用して2Dセマンティック画像セグメンテーションから派生した疑似ラベルから3Dモデルをトレーニングすることです。このアプローチでは、信頼できる疑似ラベルを選択する方法、まれなオブジェクトカテゴリで3Dシーンをサンプリングする方法、トレーニング中に2D画像からの入力特徴を疑似ラベルから切り離す方法など、いくつかの新しい問題に対処します。提案されたネットワークアーキテクチャである2D3DNetは、5大陸の20の都市でキャプチャされたライダーと画像を使用した新しい都市データセットでの実験中に、ベースラインよりも大幅に優れたパフォーマンス（+ 6.2-11.4 mIoU）を達成します。

With the recent growth of urban mapping and autonomous driving efforts, there has been an explosion of raw 3D data collected from terrestrial platforms with lidar scanners and color cameras. However, due to high labeling costs, ground-truth 3D semantic segmentation annotations are limited in both quantity and geographic diversity, while also being difficult to transfer across sensors. In contrast, large image collections with ground-truth semantic segmentations are readily available for diverse sets of scenes. In this paper, we investigate how to use only those labeled 2D image collections to supervise training 3D semantic segmentation models. Our approach is to train a 3D model from pseudo-labels derived from 2D semantic image segmentations using multiview fusion. We address several novel issues with this approach, including how to select trusted pseudo-labels, how to sample 3D scenes with rare object categories, and how to decouple input features from 2D images from pseudo-labels during training. The proposed network architecture, 2D3DNet, achieves significantly better performance (+6.2-11.4 mIoU) than baselines during experiments on a new urban dataset with lidar and images captured in 20 cities across 5 continents.

updated: Thu Oct 21 2021 17:56:28 GMT+0000 (UTC)

published: Thu Oct 21 2021 17:56:28 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト