Scalable Scene Flow from Point Clouds in the Real World

Philipp Jund; Chris Sweeney; Nichola Abdo; Zhifeng Chen; Jonathon Shlens

実世界の点群からのスケーラブルなシーンフロー

自動運転車は非常に動的な環境で動作するため、シーンのどの側面が移動しているか、どこに移動しているかを正確に評価する必要があります。シーンフローと呼ばれる3Dモーション推定への一般的なアプローチは、連続するLiDARスキャンからの3Dポイントクラウドデータを使用することですが、そのようなアプローチは、実世界の注釈付きLiDARデータのサイズが小さいために制限されています。この作業では、Waymo OpenDatasetに基づくシーンフローの新しい大規模ベンチマークを紹介します。データセットは、注釈付きフレームの数に関して、以前の実際のデータセットよりも約1,000倍大きく、対応する追跡された3Dオブジェクトから導出されます。利用可能な実際のLiDARデータの量に基づいて、以前の作業がどのように制限されたかを示します。これは、最先端の予測パフォーマンスを達成するには、より大きなデータセットが必要であることを示唆しています。さらに、人工的なダウンサンプリングなどのポイントクラウドで操作するための以前のヒューリスティックがパフォーマンスを大幅に低下させ、フルポイントクラウドで扱いやすい新しいクラスのモデルを動機付ける方法を示します。この問題に対処するために、フルポイントクラウドでリアルタイムの推論を提供するモデルアーキテクチャ\ modelname〜を紹介します。最後に、ラベルのないオブジェクトの動きを予測する方法を一般化するための未解決の問題を強調することにより、この問題が半教師あり学習の手法に適していることを示します。このデータセットが、現実世界のシーンフローシステムを開発するための新しい機会を提供し、新しいクラスの機械学習問題を動機付ける可能性があることを願っています。

Autonomous vehicles operate in highly dynamic environments necessitating an accurate assessment of which aspects of a scene are moving and where they are moving to. A popular approach to 3D motion estimation -- termed scene flow -- is to employ 3D point cloud data from consecutive LiDAR scans, although such approaches have been limited by the small size of real-world, annotated LiDAR data. In this work, we introduce a new large scale benchmark for scene flow based on the Waymo Open Dataset. The dataset is ∼1,000× larger than previous real-world datasets in terms of the number of annotated frames and is derived from the corresponding tracked 3D objects. We demonstrate how previous works were bounded based on the amount of real LiDAR data available, suggesting that larger datasets are required to achieve state-of-the-art predictive performance. Furthermore, we show how previous heuristics for operating on point clouds such as artificial down-sampling heavily degrade performance, motivating a new class of models that are tractable on the full point cloud. To address this issue, we introduce the model architecture \modelname~that provides real time inference on the full point cloud. Finally, we demonstrate that this problem is amenable to techniques from semi-supervised learning by highlighting open problems for generalizing methods for predicting motion on unlabeled objects. We hope that this dataset may provide new opportunities for developing real world scene flow systems and motivate a new class of machine learning problems.

updated: Mon Mar 01 2021 20:56:05 GMT+0000 (UTC)

published: Mon Mar 01 2021 20:56:05 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト