DeepInteraction: 3D Object Detection via Modality Interaction

Zeyu Yang; Jiaqi Chen; Zhenwei Miao; Wei Li; Xiatian Zhu; Li Zhang

DeepInteraction: モダリティ相互作用による 3D オブジェクト検出

既存の最高性能の 3D オブジェクト検出器は通常、マルチモーダルフュージョン戦略に依存しています。ただし、この設計は、モダリティ固有の有用な情報を見落とし、最終的にモデルのパフォーマンスを妨げるため、基本的に制限されています。この制限に対処するために、この作業では、オブジェクト検出中に独自の特性を利用できるようにするために、個々のモダリティごとの表現を学習および維持する、新しいモダリティ相互作用戦略を紹介します。この提案された戦略を実現するために、マルチモーダル表現相互作用エンコーダーとマルチモーダル予測相互作用デコーダーを特徴とする DeepInteraction アーキテクチャを設計します。大規模な nuScenes データセットでの実験は、提案された方法がすべての先行技術を大幅に上回ることが多いことを示しています。重要なことに、私たちの方法は、非常に競争の激しい nuScenes オブジェクト検出リーダーボードで 1 位にランクされています。

Existing top-performance 3D object detectors typically rely on the multi-modal fusion strategy. This design is however fundamentally restricted due to overlooking the modality-specific useful information and finally hampering the model performance. To address this limitation, in this work we introduce a novel modality interaction strategy where individual per-modality representations are learned and maintained throughout for enabling their unique characteristics to be exploited during object detection. To realize this proposed strategy, we design a DeepInteraction architecture characterized by a multi-modal representational interaction encoder and a multi-modal predictive interaction decoder. Experiments on the large-scale nuScenes dataset show that our proposed method surpasses all prior arts often by a large margin. Crucially, our method is ranked at the first position at the highly competitive nuScenes object detection leaderboard.

updated: Tue Aug 23 2022 17:52:54 GMT+0000 (UTC)

published: Tue Aug 23 2022 17:52:54 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト