Learning Scene Dynamics from Point Cloud Sequences

Pan He; Patrick Emami; Sanjay Ranka; Anand Rangarajan

点群シーケンスからシーンダイナミクスを学習する

3Dシーンを理解することは、自律エージェントにとって重要な前提条件です。最近、LiDARおよびその他のセンサーにより、点群フレームの時間シーケンスの形式で大量のデータが利用可能になりました。この作業では、特定のシーケンス内の点群のすべてのペアの3Dシーンフローを予測することを目的とした新しい問題（シーケンシャルシーンフロー推定（SSFE））を提案します。これは、2つのフレームに焦点を当てる以前に研究されたシーンフロー推定の問題とは異なります。 SPCM-Netアーキテクチャを紹介します。これは、隣接する点群間のマルチスケールの時空間相関を計算し、順序不変の回帰ユニットを使用して時間全体の相関を集約することにより、この問題を解決します。私たちの実験的評価は、点群シーケンスの反復処理が、2つのフレームのみを使用する場合と比較して大幅に優れたSSFEをもたらすことを確認しています。さらに、このアプローチは、将来のポイントクラウドフレームの予測を必要とする関連する問題であるシーケンシャルポイントクラウド予測（SPF）に対して効果的に変更できることを示しています。私たちの実験結果は、合成データセットと実際のデータセットで構成されるSSFEとSPFの両方の新しいベンチマークを使用して評価されます。以前は、シーンフロー推定のデータセットは2フレームに制限されていました。マルチフレームの推定と予測のために、これらのデータセットに重要な拡張機能を提供します。実世界のデータセットのグラウンドトゥルースモーションを取得するのは難しいため、自己監視型のトレーニングと評価の指標を使用します。このベンチマークは、この分野の将来の研究にとって極めて重要であると信じています。ベンチマークとモデルのすべてのコードにアクセスできるようになります。

Understanding 3D scenes is a critical prerequisite for autonomous agents. Recently, LiDAR and other sensors have made large amounts of data available in the form of temporal sequences of point cloud frames. In this work, we propose a novel problem -- sequential scene flow estimation (SSFE) -- that aims to predict 3D scene flow for all pairs of point clouds in a given sequence. This is unlike the previously studied problem of scene flow estimation which focuses on two frames. We introduce the SPCM-Net architecture, which solves this problem by computing multi-scale spatiotemporal correlations between neighboring point clouds and then aggregating the correlation across time with an order-invariant recurrent unit. Our experimental evaluation confirms that recurrent processing of point cloud sequences results in significantly better SSFE compared to using only two frames. Additionally, we demonstrate that this approach can be effectively modified for sequential point cloud forecasting (SPF), a related problem that demands forecasting future point cloud frames. Our experimental results are evaluated using a new benchmark for both SSFE and SPF consisting of synthetic and real datasets. Previously, datasets for scene flow estimation have been limited to two frames. We provide non-trivial extensions to these datasets for multi-frame estimation and prediction. Due to the difficulty of obtaining ground truth motion for real-world datasets, we use self-supervised training and evaluation metrics. We believe that this benchmark will be pivotal to future research in this area. All code for benchmark and models will be made accessible.

updated: Tue Nov 16 2021 19:52:46 GMT+0000 (UTC)

published: Tue Nov 16 2021 19:52:46 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト