MPPNet: Multi-Frame Feature Intertwining with Proxy Points for 3D Temporal Object Detection

Xuesong Chen; Shaoshuai Shi; Benjin Zhu; Ka Chun Cheung; Hang Xu; Hongsheng Li

MPPNet：3D時間オブジェクト検出のためのプロキシポイントと絡み合うマルチフレーム機能

正確で信頼性の高い3D検出は、自動運転車やサービスロボットを含む多くのアプリケーションにとって不可欠です。この論文では、点群シーケンスを使用した3D時間オブジェクト検出のための、MPPNetという名前の柔軟で高性能な3D検出フレームワークを紹介します。より良い検出を達成するために、マルチフレーム機能のエンコードと相互作用のためのプロキシポイントを備えた新しい3階層フレームワークを提案します。 3つの階層は、フレームごとの機能エンコーディング、ショートクリップ機能の融合、およびシーケンス全体の機能の集約をそれぞれ実行します。妥当な計算リソースで長いシーケンスの点群を処理できるようにするために、グループ内の特徴の混合とグループ間の特徴の注意を提案して、マルチフレーム軌道特徴を集約するために繰り返し適用される2番目と3番目の特徴エンコーディング階層を形成します。プロキシポイントは、各フレームの一貫したオブジェクト表現として機能するだけでなく、フレーム間の機能の相互作用を促進するための宅配便としても機能します。 largeWaymo Openデータセットでの実験は、短い（たとえば、4フレーム）点群シーケンスと長い（たとえば、16フレーム）点群シーケンスの両方に適用した場合、私たちのアプローチが大きなマージンで最先端の方法よりも優れていることを示しています。具体的には、MPPNetは、16フレーム入力のLEVEL 2 mAPHメトリックで、車両、歩行者、およびサイクリストのクラスで74.21％、74.62％、および73.31％を達成します。

Accurate and reliable 3D detection is vital for many applications including autonomous driving vehicles and service robots. In this paper, we present a flexible and high-performance 3D detection framework, named MPPNet, for 3D temporal object detection with point cloud sequences. We propose a novel three-hierarchy framework with proxy points for multi-frame feature encoding and interactions to achieve better detection. The three hierarchies conduct per-frame feature encoding, short-clip feature fusion, and whole-sequence feature aggregation, respectively. To enable processing long-sequence point clouds with reasonable computational resources, intra-group feature mixing and inter-group feature attention are proposed to form the second and third feature encoding hierarchies, which are recurrently applied for aggregating multi-frame trajectory features. The proxy points not only act as consistent object representations for each frame, but also serve as the courier to facilitate feature interaction between frames. The experiments on largeWaymo Open dataset show that our approach outperforms state-of-the-art methods with large margins when applied to both short (e.g., 4-frame) and long (e.g., 16-frame) point cloud sequences. Specifically, MPPNet achieves 74.21%, 74.62% and 73.31% for vehicle, pedestrian and cyclist classes on the LEVEL 2 mAPH metric with 16-frame input.

updated: Thu May 12 2022 09:38:42 GMT+0000 (UTC)

published: Thu May 12 2022 09:38:42 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト