Securing Biomedical Images from Unauthorized Training with Anti-Learning Perturbation

Yixin Liu; Haohui Ye; Kai Zhang; Lichao Sun

アンチラーニング摂動による無許可のトレーニングから生物医学画像を保護する

オープンソースの生物医学データの量は、ヘルスケアコミュニティのさまざまな領域の発展に不可欠でした。より多くの「無料」データは、個々の研究者が貢献する機会を増やすことができるからです。しかし、組織は、権限のない第三者による別の商用利用 (AI モデルのトレーニングなど) によるデータの悪用のリスクがあるため、データを一般に公開することを躊躇することがよくあります。この現象は、ヘルスケア研究コミュニティ全体の発展を妨げる可能性があります。この懸念に対処するために、認識できないが妄想的なノイズをデータに注入し、AI モデルで悪用できないようにすることで生物医学データを保護するための、「学習不可能な生物医学画像」と呼ばれる新しいアプローチを提案します。問題をバイレベル最適化として定式化し、問題を解決するための 3 種類の反学習摂動生成アプローチを提案します。私たちの方法は、研究コミュニティの長期的な発展のために、より多くの機関がデータを提供することを奨励するための重要なステップです。

The volume of open-source biomedical data has been essential to the development of various spheres of the healthcare community since more `free' data can provide individual researchers more chances to contribute. However, institutions often hesitate to share their data with the public due to the risk of data exploitation by unauthorized third parties for another commercial usage (e.g., training AI models). This phenomenon might hinder the development of the whole healthcare research community. To address this concern, we propose a novel approach termed `unlearnable biomedical image' for protecting biomedical data by injecting imperceptible but delusive noises into the data, making them unexploitable for AI models. We formulate the problem as a bi-level optimization and propose three kinds of anti-learning perturbation generation approaches to solve the problem. Our method is an important step toward encouraging more institutions to contribute their data for the long-term development of the research community.

updated: Sun Mar 05 2023 03:09:03 GMT+0000 (UTC)

published: Sun Mar 05 2023 03:09:03 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト