Exploiting Explanations for Model Inversion Attacks

Xuejun Zhao; Wencan Zhang; Xiaokui Xiao; Brian Y. Lim

モデル反転攻撃の説明を悪用する

ヘルスケアから雇用までの多くのドメインで人工知能（AI）をうまく展開するには、特にモデルの説明とプライバシーにおいて、責任を持って使用する必要があります。説明可能な人工知能（XAI）は、ユーザーがモデルの決定を理解するのに役立つより多くの情報を提供しますが、この追加の知識はプライバシー攻撃の追加のリスクを明らかにします。したがって、説明を提供するとプライバシーが損なわれます。画像ベースのモデル反転攻撃のこのリスクを調査し、モデルの説明からプライベート画像データを再構築するためのパフォーマンスが向上したいくつかの攻撃アーキテクチャを特定しました。ターゲットモデルの予測のみを使用するよりも大幅に高い反転パフォーマンスを実現する、いくつかのマルチモーダル転置CNNアーキテクチャを開発しました。これらのXAI対応の反転モデルは、画像の説明で空間知識を活用するように設計されています。どの説明がプライバシーリスクが高いかを理解するために、さまざまな説明の種類と要因が反転パフォーマンスにどのように影響するかを分析しました。一部のモデルでは説明が提供されていませんが、注意の伝達を通じて代理モデルの説明を活用することで、説明できないターゲットモデルでも反転パフォーマンスが向上することをさらに示しています。この方法では、最初にターゲット予測から説明を反転し、次にターゲット画像を再構築します。これらの脅威は、説明の緊急かつ重大なプライバシーリスクを浮き彫りにし、AIの説明可能性とプライバシーの二重要件のバランスをとる新しいプライバシー保護技術に注意を促します。

The successful deployment of artificial intelligence (AI) in many domains from healthcare to hiring requires their responsible use, particularly in model explanations and privacy. Explainable artificial intelligence (XAI) provides more information to help users to understand model decisions, yet this additional knowledge exposes additional risks for privacy attacks. Hence, providing explanation harms privacy. We study this risk for image-based model inversion attacks and identified several attack architectures with increasing performance to reconstruct private image data from model explanations. We have developed several multi-modal transposed CNN architectures that achieve significantly higher inversion performance than using the target model prediction only. These XAI-aware inversion models were designed to exploit the spatial knowledge in image explanations. To understand which explanations have higher privacy risk, we analyzed how various explanation types and factors influence inversion performance. In spite of some models not providing explanations, we further demonstrate increased inversion performance even for non-explainable target models by exploiting explanations of surrogate models through attention transfer. This method first inverts an explanation from the target prediction, then reconstructs the target image. These threats highlight the urgent and significant privacy risks of explanations and calls attention for new privacy preservation techniques that balance the dual-requirement for AI explainability and privacy.

updated: Mon Apr 26 2021 15:53:57 GMT+0000 (UTC)

published: Mon Apr 26 2021 15:53:57 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト