Availability Attacks Against Neural Network Certifiers Based on Backdoors

Tobias Lorenz; Marta Kwiatkowska; Mario Fritz

バックドアに基づくニューラルネットワーク認証者に対する可用性攻撃

信頼性が高く、堅牢で安全な AI システムを実現するには、AI 予測が信頼できない場合にフォールバック戦略を実装することが重要です。ニューラルネットワークの認証者は、これらの予測の堅牢性を確認するための信頼できる方法です。それらは、特定のクラスの操作または攻撃が結果を変えることができなかったといういくつかの予測を保証します.保証のない残りの予測については、メソッドは予測を行わず、フォールバック戦略を呼び出す必要があります。これには通常、追加のコストが発生し、人間のオペレーターが必要になるか、予測を提供できないことさえあります。これは安全でセキュアな AI に向けた重要な概念ですが、このようなフォールバック戦略は敵によって意図的にトリガーされる可能性があるため、このアプローチには独自のセキュリティリスクが伴うことを初めて示しました。トレーニング時間攻撃を使用すると、攻撃者はモデルの認定された堅牢性を大幅に低下させ、モデルを利用できなくする可能性があります。これにより、メインシステムの負荷がフォールバックに移され、システム全体の整合性と可用性が低下します。これらの脅威の実際的な関連性を示す 2 つの新しいバックドア攻撃を設計します。たとえば、トレーニング中に 1% の汚染されたデータを追加すると、認定された堅牢性が最大 95 パーセント低下します。複数のデータセット、モデルアーキテクチャ、および認証者に対する広範な実験により、これらの攻撃の幅広い適用性が実証されました。潜在的な防御に関する最初の調査では、現在のアプローチでは問題を軽減するには不十分であることが示され、より具体的な新しいソリューションの必要性が強調されています。

To achieve reliable, robust, and safe AI systems it is important to implement fallback strategies when AI predictions cannot be trusted. Certifiers for neural networks are a reliable way to check the robustness of these predictions. They guarantee for some predictions that a certain class of manipulations or attacks could not have changed the outcome. For the remaining predictions without guarantees, the method abstains from making a prediction and a fallback strategy needs to be invoked, which typically incurs additional costs, can require a human operator, or even fail to provide any prediction. While this is a key concept towards safe and secure AI, we show for the first time that this approach comes with its own security risks, as such fallback strategies can be deliberately triggered by an adversary. Using training-time attacks, the adversary can significantly reduce the certified robustness of the model, making it unavailable. This transfers the main system load onto the fallback, reducing the overall system's integrity and availability. We design two novel backdoor attacks which show the practical relevance of these threats. For example, adding 1% poisoned data during training is sufficient to reduce certified robustness by up to 95 percentage points. Our extensive experiments across multiple datasets, model architectures, and certifiers demonstrate the wide applicability of these attacks. A first investigation into potential defenses shows that current approaches are insufficient to mitigate the issue, highlighting the need for new, more specific solutions.

updated: Sun Oct 02 2022 16:58:47 GMT+0000 (UTC)

published: Wed Aug 25 2021 15:49:10 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト