Practical Collaborative Perception: A Framework for Asynchronous and Multi-Agent 3D Object Detection

Minh-Quan Dao; Julie Stephany Berrio; Vincent Frémont; Mao Shan; Elwan Héry; Stewart Worrall

実用的な共同認識: 非同期およびマルチエージェント 3D オブジェクト検出のフレームワーク

オクルージョンは、LiDAR ベースの物体検出方法にとって大きな課題です。この課題は、多数の道路利用者によってもたらされる障害物により視野が大幅に狭くなる一方で、自車両が衝突を回避するために信頼性の高い物体検出を備えなければならない都市交通においては、安全性が極めて重要になります。 Vehicle-to-Everything (V2X) 通信を介した共同認識は、接続されたエージェントが複数の場所に存在することによる多様な視点を活用して、完全なシーン表現を形成する、魅力的なソリューションです。最先端の V2X 手法は、コラボレーション中期のアプローチを使用してパフォーマンスと帯域幅のトレードオフを解決します。このアプローチでは、点群の鳥瞰図画像が交換されるため、初期のコラボレーションのように点群を通信するよりも帯域幅の消費が低くなります。接続されたエージェント間のより深い対話のおかげで、検出パフォーマンスは、エージェントの出力を融合する後期コラボレーションよりも高くなります。強力なパフォーマンスを実現する一方で、ほとんどの中間コラボレーションアプローチの現実世界への展開は、学習可能なコラボレーショングラフやオートエンコーダベースの圧縮器/解凍器を含む過度に複雑なアーキテクチャと、エージェント間の同期に関する非現実的な仮定によって妨げられています。この研究では、単一車両検出モデルへの変更を最小限に抑え、エージェント間の同期に関する非現実的な仮定を緩和しながら、従来の最先端の方法よりも優れた帯域幅とパフォーマンスのトレードオフを実現する、シンプルかつ効果的なコラボレーション方法を考案します。。 V2X-Sim データセットの実験では、私たちのコラボレーション手法が初期コラボレーション手法のパフォーマンスの 98% を達成しながら、後期コラボレーション手法と同等の帯域幅しか消費しないことがわかりました。

Occlusion is a major challenge for LiDAR-based object detection methods. This challenge becomes safety-critical in urban traffic where the ego vehicle must have reliable object detection to avoid collision while its field of view is severely reduced due to the obstruction posed by a large number of road users. Collaborative perception via Vehicle-to-Everything (V2X) communication, which leverages the diverse perspective thanks to the presence at multiple locations of connected agents to form a complete scene representation, is an appealing solution. State-of-the-art V2X methods resolve the performance-bandwidth tradeoff using a mid-collaboration approach where the Bird-Eye View images of point clouds are exchanged so that the bandwidth consumption is lower than communicating point clouds as in early collaboration, and the detection performance is higher than late collaboration, which fuses agents' output, thanks to a deeper interaction among connected agents. While achieving strong performance, the real-world deployment of most mid-collaboration approaches is hindered by their overly complicated architectures, involving learnable collaboration graphs and autoencoder-based compressor/ decompressor, and unrealistic assumptions about inter-agent synchronization. In this work, we devise a simple yet effective collaboration method that achieves a better bandwidth-performance tradeoff than prior state-of-the-art methods while minimizing changes made to the single-vehicle detection models and relaxing unrealistic assumptions on inter-agent synchronization. Experiments on the V2X-Sim dataset show that our collaboration method achieves 98% of the performance of an early-collaboration method, while only consuming the equivalent bandwidth of a late-collaboration method.

updated: Sun Jul 09 2023 07:41:26 GMT+0000 (UTC)

published: Tue Jul 04 2023 03:49:42 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト