Bayesian Learning with Information Gain Provably Bounds Risk for a Robust Adversarial Defense

Bao Gia Doan; Ehsan Abbasnejad; Javen Qinfeng Shi; Damith C. Ranasinghe

情報獲得を伴うベイジアン学習は、堅牢な敵対的防御のリスクを確実に制限します

敵対的な攻撃に対して堅牢なディープニューラルネットワークモデルを学習するための新しいアルゴリズムを提示します。以前のアルゴリズムは、敵対的に訓練されたベイジアンニューラルネットワーク (BNN) が改善された堅牢性を提供することを示しています。ベイジアンモデルのマルチモーダル事後分布を近似するための敵対的学習アプローチは、モード崩壊につながる可能性があることを認識しています。その結果、堅牢性とパフォーマンスにおけるモデルの成果は最適ではありません。代わりに、マルチモーダル事後分布をより適切に近似するために、モード崩壊を防止することを最初に提案します。第二に、堅牢なモデルは摂動を無視し、入力の有益なコンテンツのみを考慮する必要があるという直感に基づいて、良性および敵対的なトレーニングインスタンスの両方から学習した情報を測定して強制するための情報ゲイン目標を概念化し、定式化します。重要なことです。情報獲得目標を最小化することで、敵対的リスクを従来の経験的リスクに近づけることができることを証明し、実証します。私たちの取り組みは、BNN を敵対的に訓練する原則的な方法の基礎に向けた一歩を提供すると信じています。私たちのモデルは、CIFAR-10 と STL-10 の両方のデータセットで 0.035 の歪みを持つ PGD 攻撃の下で、敵対的トレーニングと Adv-BNN と比較して、大幅に改善された堅牢性 (最大 20%) を示しています。

We present a new algorithm to learn a deep neural network model robust against adversarial attacks. Previous algorithms demonstrate an adversarially trained Bayesian Neural Network (BNN) provides improved robustness. We recognize the adversarial learning approach for approximating the multi-modal posterior distribution of a Bayesian model can lead to mode collapse; consequently, the model's achievements in robustness and performance are sub-optimal. Instead, we first propose preventing mode collapse to better approximate the multi-modal posterior distribution. Second, based on the intuition that a robust model should ignore perturbations and only consider the informative content of the input, we conceptualize and formulate an information gain objective to measure and force the information learned from both benign and adversarial training instances to be similar. Importantly. we prove and demonstrate that minimizing the information gain objective allows the adversarial risk to approach the conventional empirical risk. We believe our efforts provide a step toward a basis for a principled method of adversarially training BNNs. Our model demonstrate significantly improved robustness--up to 20%--compared with adversarial training and Adv-BNN under PGD attacks with 0.035 distortion on both CIFAR-10 and STL-10 datasets.

updated: Mon Dec 05 2022 03:26:08 GMT+0000 (UTC)

published: Mon Dec 05 2022 03:26:08 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト