Single Image Backdoor Inversion via Robust Smoothed Classifiers

Mingjie Sun; Zico Kolter

堅牢な平滑化分類器による単一画像のバックドア反転

機械学習モデルに挿入されたバックドアトリガーを見つけるプロセスであるバックドアインバージョンは、多くのバックドア検出および防御方法の柱となっています。バックドア反転に関する以前の研究では、多くの場合、最適化プロセスを通じてバックドアを回復し、クリーンなイメージのサポートセットをターゲットクラスに反転させました。ただし、成功したバックドアを回復するために、このサポートセットがどれだけ大きくなければならないかについては、ほとんど研究および理解されていません。この作業では、わずか 1 つの画像でバックドアトリガーを確実に回復できることを示しています。具体的には、SmoothInv メソッドを提案します。このメソッドは、最初にバックドア分類器の堅牢な平滑化バージョンを構築し、次にターゲットクラスに対してガイド付き画像合成を実行してバックドアパターンを明らかにします。 SmoothInv は、マスク変数を介したバックドアの明示的なモデル化も、バックドアインバージョンメソッドの標準的なプラクティスとなっている複雑な正則化スキームも必要としません。以前に公開されたバックドア攻撃からのバックドア分類子について、定量的および定性的な研究を行います。既存の方法と比較して、SmoothInv は元のバックドアに対する高い忠実度を維持しながら、成功したバックドアを単一の画像から復元できることを示しています。また、バックドア分類器からターゲットのバックドアクラスを特定する方法も示します。最後に、私たちのアプローチに対する 2 つの対策を提案して分析し、SmoothInv が適応攻撃に直面しても依然として堅牢であることを示します。コードは https://github.com/locuslab/smoothinv で入手できます。

Backdoor inversion, the process of finding a backdoor trigger inserted into a machine learning model, has become the pillar of many backdoor detection and defense methods. Previous works on backdoor inversion often recover the backdoor through an optimization process to flip a support set of clean images into the target class. However, it is rarely studied and understood how large this support set should be to recover a successful backdoor. In this work, we show that one can reliably recover the backdoor trigger with as few as a single image. Specifically, we propose the SmoothInv method, which first constructs a robust smoothed version of the backdoored classifier and then performs guided image synthesis towards the target class to reveal the backdoor pattern. SmoothInv requires neither an explicit modeling of the backdoor via a mask variable, nor any complex regularization schemes, which has become the standard practice in backdoor inversion methods. We perform both quantitaive and qualitative study on backdoored classifiers from previous published backdoor attacks. We demonstrate that compared to existing methods, SmoothInv is able to recover successful backdoors from single images, while maintaining high fidelity to the original backdoor. We also show how we identify the target backdoored class from the backdoored classifier. Last, we propose and analyze two countermeasures to our approach and show that SmoothInv remains robust in the face of an adaptive attacker. Our code is available at https://github.com/locuslab/smoothinv .

updated: Wed Mar 01 2023 03:37:42 GMT+0000 (UTC)

published: Wed Mar 01 2023 03:37:42 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト