Label-Only Model Inversion Attacks via Boundary Repulsion

Mostafa Kahla; Si Chen; Hoang Anh Just; Ruoxi Jia

境界反発によるラベルのみのモデル反転攻撃

最近の研究によると、最先端のディープニューラルネットワークはモデル反転攻撃に対して脆弱であり、モデルへのアクセスが悪用されて、特定のターゲットクラスのプライベートトレーニングデータが再構築されます。既存の攻撃は、完全なターゲットモデル（ホワイトボックス）またはモデルのソフトラベル（ブラックボックス）のいずれかにアクセスできることに依存しています。ただし、攻撃者が信頼性の尺度なしでモデルの予測ラベルにのみアクセスできる、より困難でより実用的なシナリオでは、これまでの作業は行われていません。この論文では、ターゲットモデルの予測ラベルのみを使用してプライベートトレーニングデータを反転するアルゴリズム、境界反発モデル反転（BREP-MI）を紹介します。このアルゴリズムの重要なアイデアは、球上でモデルの予測ラベルを評価してから、ターゲットクラスの重心に到達する方向を推定することです。顔認識の例を使用して、BREP-MIによって再構築された画像が、さまざまなデータセットとターゲットモデルアーキテクチャのプライベートトレーニングデータのセマンティクスを正常に再現することを示します。 BREP-MIを最新のホワイトボックスおよびブラックボックスモデルの反転攻撃と比較すると、ターゲットモデルに関する知識が少ないと仮定しても、BREP-MIはブラックボックス攻撃よりも優れており、ホワイトボックス攻撃と同等の結果が得られることがわかります。

Recent studies show that the state-of-the-art deep neural networks are vulnerable to model inversion attacks, in which access to a model is abused to reconstruct private training data of any given target class. Existing attacks rely on having access to either the complete target model (whitebox) or the model's soft-labels (blackbox). However, no prior work has been done in the harder but more practical scenario, in which the attacker only has access to the model's predicted label, without a confidence measure. In this paper, we introduce an algorithm, Boundary-Repelling Model Inversion (BREP-MI), to invert private training data using only the target model's predicted labels. The key idea of our algorithm is to evaluate the model's predicted labels over a sphere and then estimate the direction to reach the target class's centroid. Using the example of face recognition, we show that the images reconstructed by BREP-MI successfully reproduce the semantics of the private training data for various datasets and target model architectures. We compare BREP-MI with the state-of-the-art whitebox and blackbox model inversion attacks and the results show that despite assuming less knowledge about the target model, BREP-MI outperforms the blackbox attack and achieves comparable results to the whitebox attack.

updated: Thu Mar 03 2022 18:57:57 GMT+0000 (UTC)

published: Thu Mar 03 2022 18:57:57 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト