A Privacy-Preserving Walk in the Latent Space of Generative Models for Medical Applications

Matteo Pennisi; Federica Proietto Salanitri; Giovanni Bellitto; Simone Palazzo; Ulas Bagci; Concetto Spampinato

医療アプリケーションの生成モデルの潜在空間におけるプライバシー保護のウォーク

Generative Adversarial Networks (GAN) は、ターゲットの分布に一致する合成サンプルを生成する能力を実証しました。ただし、プライバシーの観点から見ると、GAN をデータ共有のプロキシとして使用することは安全な解決策ではありません。GAN は実際のサンプルのほぼ重複したものを潜在空間に埋め込む傾向があるからです。 k 匿名性の原則に触発された最近の研究では、データセットが k 分の 1 に削減されるという欠点を伴いながら、潜在空間でのサンプル集約を通じてこの問題に対処しています。私たちの研究は、原則に基づいた方法でプライバシーの問題に対処しながら、ディープモデルの効果的なトレーニングをサポートする可能性のある多様な合成サンプルを生成できる潜在的な空間ナビゲーション戦略を提案することで、この問題を軽減することを目的としています。私たちのアプローチは、潜在空間内の点間を非線形に移動するためのガイドとして補助的な識別分類子を活用し、実際のサンプルのほぼ重複との衝突のリスクを最小限に抑えます。私たちは、潜在空間内の点のランダムなペアが与えられた場合、ウォーキング戦略が線形補間よりも安全であることを経験的に示しています。次に、k-same 法と組み合わせた経路探索戦略をテストし、結核と糖尿病性網膜症分類の 2 つのベンチマークで、私たちのアプローチによって生成されたサンプルを使用してモデルをトレーニングすると、プライバシーの保護を維持しながらパフォーマンスの低下が軽減されることを実証します。

Generative Adversarial Networks (GANs) have demonstrated their ability to generate synthetic samples that match a target distribution. However, from a privacy perspective, using GANs as a proxy for data sharing is not a safe solution, as they tend to embed near-duplicates of real samples in the latent space. Recent works, inspired by k-anonymity principles, address this issue through sample aggregation in the latent space, with the drawback of reducing the dataset by a factor of k. Our work aims to mitigate this problem by proposing a latent space navigation strategy able to generate diverse synthetic samples that may support effective training of deep models, while addressing privacy concerns in a principled way. Our approach leverages an auxiliary identity classifier as a guide to non-linearly walk between points in the latent space, minimizing the risk of collision with near-duplicates of real samples. We empirically demonstrate that, given any random pair of points in the latent space, our walking strategy is safer than linear interpolation. We then test our path-finding strategy combined to k-same methods and demonstrate, on two benchmarks for tuberculosis and diabetic retinopathy classification, that training a model using samples generated by our approach mitigate drops in performance, while keeping privacy preservation.

updated: Thu Jul 06 2023 13:35:48 GMT+0000 (UTC)

published: Thu Jul 06 2023 13:35:48 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト