Learning Distilled Collaboration Graph for Multi-Agent Perception

Yiming Li; Shunli Ren; Pengxiang Wu; Siheng Chen; Chen Feng; Wenjun Zhang

マルチエージェント知覚のための蒸留コラボレーショングラフの学習

マルチエージェント知覚のパフォーマンスと帯域幅のトレードオフを促進するために、エージェント間のトレーニング可能でポーズを意識した適応型コラボレーションをモデル化するための新しい蒸留コラボレーショングラフ（DiscoGraph）を提案します。私たちの重要な目新しさは2つの側面にあります。まず、知識の蒸留によってDiscoGraphをトレーニングするための教師と生徒のフレームワークを提案します。教師モデルは、全体論的視点の入力との初期のコラボレーションを採用しています。学生モデルは、シングルビュー入力との中間コラボレーションに基づいています。私たちのフレームワークは、教師モデルの対応に一致するように学生モデルのコラボレーション後の特徴マップを制約することにより、DiscoGraphをトレーニングします。次に、DiscoGraphで行列値のエッジの重みを提案します。このようなマトリックスでは、各要素は特定の空間領域でのエージェント間の注意を反映し、エージェントが有益な領域を適応的に強調表示できるようにします。推論中は、蒸留コラボレーションネットワーク（DiscoNet）という名前の学生モデルのみを使用する必要があります。教師と生徒のフレームワークに起因する、共有DiscoNetを使用する複数のエージェントは、全体論的な視点で架空の教師モデルのパフォーマンスに協力してアプローチできます。私たちのアプローチは、CARLAとSUMOの協調シミュレーションを使用して合成した大規模なマルチエージェント知覚データセットであるV2X-Sim1.0で検証されています。マルチエージェント3Dオブジェクト検出における定量的および定性的な実験は、DiscoNetが最先端の協調的知覚手法よりも優れたパフォーマンスと帯域幅のトレードオフを達成できるだけでなく、より単純な設計根拠をもたらすことを示しています。私たちのコードはhttps://github.com/ai4ce/DiscoNetで入手できます。

To promote better performance-bandwidth trade-off for multi-agent perception, we propose a novel distilled collaboration graph (DiscoGraph) to model trainable, pose-aware, and adaptive collaboration among agents. Our key novelties lie in two aspects. First, we propose a teacher-student framework to train DiscoGraph via knowledge distillation. The teacher model employs an early collaboration with holistic-view inputs; the student model is based on intermediate collaboration with single-view inputs. Our framework trains DiscoGraph by constraining post-collaboration feature maps in the student model to match the correspondences in the teacher model. Second, we propose a matrix-valued edge weight in DiscoGraph. In such a matrix, each element reflects the inter-agent attention at a specific spatial region, allowing an agent to adaptively highlight the informative regions. During inference, we only need to use the student model named as the distilled collaboration network (DiscoNet). Attributed to the teacher-student framework, multiple agents with the shared DiscoNet could collaboratively approach the performance of a hypothetical teacher model with a holistic view. Our approach is validated on V2X-Sim 1.0, a large-scale multi-agent perception dataset that we synthesized using CARLA and SUMO co-simulation. Our quantitative and qualitative experiments in multi-agent 3D object detection show that DiscoNet could not only achieve a better performance-bandwidth trade-off than the state-of-the-art collaborative perception methods, but also bring more straightforward design rationale. Our code is available on https://github.com/ai4ce/DiscoNet.

updated: Sat Jan 15 2022 17:21:11 GMT+0000 (UTC)

published: Mon Nov 01 2021 01:13:19 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト