Reconstructing Training Data from Trained Neural Networks

Niv Haim; Gal Vardi; Gilad Yehudai; Ohad Shamir; Michal Irani

トレーニングされたニューラルネットワークからのトレーニングデータの再構築

ニューラルネットワークがトレーニングデータをどの程度記憶するかを理解することは、実用的および理論的な意味を持つ興味深い質問です。この論文では、場合によっては、トレーニングされたニューラルネットワーク分類器のパラメータからトレーニングデータのかなりの部分を実際に再構築できることを示します。勾配ベースの方法でニューラルネットワークをトレーニングする際の暗黙のバイアスに関する最近の理論的結果に由来する新しい再構成スキームを提案します。私たちの知る限り、私たちの結果は、訓練されたニューラルネットワーク分類器から実際の訓練サンプルの大部分を再構築することが一般的に可能であることを示した最初のものです。これは、機密性の高いトレーニングデータを公開するための攻撃として使用される可能性があるため、プライバシーに悪影響を及ぼします。いくつかの標準的なコンピュータービジョンデータセットでのバイナリMLP分類子の方法を示します。

Understanding to what extent neural networks memorize training data is an intriguing question with practical and theoretical implications. In this paper we show that in some cases a significant fraction of the training data can in fact be reconstructed from the parameters of a trained neural network classifier. We propose a novel reconstruction scheme that stems from recent theoretical results about the implicit bias in training neural networks with gradient-based methods. To the best of our knowledge, our results are the first to show that reconstructing a large portion of the actual training samples from a trained neural network classifier is generally possible. This has negative implications on privacy, as it can be used as an attack for revealing sensitive training data. We demonstrate our method for binary MLP classifiers on a few standard computer vision datasets.

updated: Mon Dec 05 2022 14:49:34 GMT+0000 (UTC)

published: Wed Jun 15 2022 18:35:16 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト