Fusion is Not Enough: Single-Modal Attacks to Compromise Fusion Models in Autonomous Driving

Zhiyuan Cheng; Hongjun Choi; James Liang; Shiwei Feng; Guanhong Tao; Dongfang Liu; Michael Zuzak; Xiangyu Zhang

融合だけでは不十分: 自動運転の融合モデルを侵害するための単一モーダル攻撃

マルチセンサーフュージョン (MSF) は、自動運転車 (AV) の認識、特にカメラと LiDAR センサーによる 3D オブジェクト検出のタスクに広く採用されています。融合の背後にある理論的根拠は、各モダリティの制限を緩和しながら、各モダリティの長所を活用することです。フュージョンモデルの並外れた優れたパフォーマンスは、高度なディープニューラルネットワーク (DNN) ベースのフュージョン技術によって実証されています。融合モデルは、複数のモダリティで冗長な情報があるため、単一モードのモデルと比較して攻撃に対してより堅牢であると認識されています。この作業では、カメラモダリティを標的とするシングルモーダル攻撃でこの視点に挑戦します。カメラモダリティは、フュージョンではそれほど重要ではないが、攻撃者にとってより手頃な価格であると考えられています。融合モデルの最も弱いリンクは、最も脆弱なモダリティに依存していると主張し、敵対的なパッチを備えた高度なカメラ-LiDAR 融合モデルを標的とする攻撃フレームワークを提案します。私たちのアプローチでは、最初に敵対的攻撃を受けている脆弱な画像領域を包括的に評価し、次にカスタマイズされた攻撃戦略をさまざまな融合モデルに適用して、展開可能なパッチを生成する 2 段階の最適化ベースの戦略を採用しています。現実世界のデータセットで 5 つの最先端のカメラと LiDAR の融合モデルを使用した評価では、攻撃がすべてのモデルの侵害に成功したことが示されています。私たちのアプローチは、検出パフォーマンスの平均精度 (mAP) を 0.824 から 0.353 に下げるか、ターゲットオブジェクトの検出スコアを平均で 0.727 から 0.151 に低下させることができ、提案された攻撃フレームワークの有効性と実用性を実証します。

Multi-sensor fusion (MSF) is widely adopted for perception in autonomous vehicles (AVs), particularly for the task of 3D object detection with camera and LiDAR sensors. The rationale behind fusion is to capitalize on the strengths of each modality while mitigating their limitations. The exceptional and leading performance of fusion models has been demonstrated by advanced deep neural network (DNN)-based fusion techniques. Fusion models are also perceived as more robust to attacks compared to single-modal ones due to the redundant information in multiple modalities. In this work, we challenge this perspective with single-modal attacks that targets the camera modality, which is considered less significant in fusion but more affordable for attackers. We argue that the weakest link of fusion models depends on their most vulnerable modality, and propose an attack framework that targets advanced camera-LiDAR fusion models with adversarial patches. Our approach employs a two-stage optimization-based strategy that first comprehensively assesses vulnerable image areas under adversarial attacks, and then applies customized attack strategies to different fusion models, generating deployable patches. Evaluations with five state-of-the-art camera-LiDAR fusion models on a real-world dataset show that our attacks successfully compromise all models. Our approach can either reduce the mean average precision (mAP) of detection performance from 0.824 to 0.353 or degrade the detection score of the target object from 0.727 to 0.151 on average, demonstrating the effectiveness and practicality of our proposed attack framework.

updated: Fri Apr 28 2023 03:39:00 GMT+0000 (UTC)

published: Fri Apr 28 2023 03:39:00 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト