Towards Effective and Robust Neural Trojan Defenses via Input Filtering

Kien Do; Haripriya Harikumar; Hung Le; Dung Nguyen; Truyen Tran; Santu Rana; Dang Nguyen; Willy Susilo; Svetha Venkatesh

入力フィルタリングを介した効果的で堅牢なニューラルトロイの木馬防御に向けて

ディープニューラルネットワークへのトロイの木馬攻撃は、危険であり、不正です。過去数年間で、トロイの木馬攻撃は、単一の入力に依存しないトリガーのみを使用して1つのクラスのみを標的にすることから、複数の入力固有のトリガーを使用して複数のクラスを標的にすることへと進歩しました。ただし、トロイの木馬の防御はこの開発に追いついていない。ほとんどの防御方法は、依然としてトロイの木馬のトリガーとターゲットクラスについて不適切な仮定をしているため、最新のトロイの木馬攻撃によって簡単に回避できます。この問題に対処するために、非可逆データ圧縮と敵対的学習をそれぞれ活用して、実行時に入力内の潜在的なトロイの木馬トリガーを効果的に浄化する、Variational Input Filtering（VIF）とAdversarial Input Filtering（AIF）と呼ばれる2つの新しい「フィルタリング」防御を提案します。トリガー/ターゲットクラスの数またはトリガーの入力依存プロパティについての仮定を作成します。さらに、「フィルタリング」によって引き起こされるクリーンなデータの分類精度の低下を回避するのに役立つ「Filtering-then-Contrasting」（FtC）と呼ばれる新しい防御メカニズムを導入し、それをVIF / AIFと組み合わせて、この新しい防御を導き出します。親切。広範な実験結果とアブレーション研究は、提案された防御が、少量のトレーニングデータと大規模な標準トリガーに対して非常に堅牢でありながら、最近の2つの最先端を含む5つの高度なトロイの木馬攻撃を軽減するという点で、よく知られたベースライン防御を大幅に上回っていることを示しています。

Trojan attacks on deep neural networks are both dangerous and surreptitious. Over the past few years, Trojan attacks have advanced from using only a single input-agnostic trigger and targeting only one class to using multiple, input-specific triggers and targeting multiple classes. However, Trojan defenses have not caught up with this development. Most defense methods still make inadequate assumptions about Trojan triggers and target classes, thus, can be easily circumvented by modern Trojan attacks. To deal with this problem, we propose two novel "filtering" defenses called Variational Input Filtering (VIF) and Adversarial Input Filtering (AIF) which leverage lossy data compression and adversarial learning respectively to effectively purify potential Trojan triggers in the input at run time without making assumptions about the number of triggers/target classes or the input dependence property of triggers. In addition, we introduce a new defense mechanism called "Filtering-then-Contrasting" (FtC) which helps avoid the drop in classification accuracy on clean data caused by "filtering", and combine it with VIF/AIF to derive new defenses of this kind. Extensive experimental results and ablation studies show that our proposed defenses significantly outperform well-known baseline defenses in mitigating five advanced Trojan attacks including two recent state-of-the-art while being quite robust to small amounts of training data and large-norm triggers.

updated: Thu Jul 07 2022 20:45:29 GMT+0000 (UTC)

published: Thu Feb 24 2022 15:41:37 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト