Diffusion Art or Digital Forgery? Investigating Data Replication in Diffusion Models

Gowthami Somepalli; Vasu Singla; Micah Goldblum; Jonas Geiping; Tom Goldstein

拡散アートかデジタル偽造か?拡散モデルにおけるデータ複製の調査

最先端の拡散モデルは、高品質でカスタマイズ可能な画像を生成し、商用アートやグラフィックデザインの目的で使用できるようにします。しかし、拡散モデルは独自の芸術作品を生み出しているのでしょうか?それとも、トレーニングセットから直接コンテンツを盗んでいるのでしょうか?この作業では、生成された画像をトレーニングサンプルと比較し、コンテンツがいつ複製されたかを検出できるようにする画像検索フレームワークを研究します。オックスフォードの花、Celeb-A、ImageNet、LAION などの複数のデータセットでトレーニングされた拡散モデルにフレームワークを適用し、トレーニングセットのサイズなどの要因がコンテンツの複製率にどのように影響するかについて説明します。また、人気のある Stable Diffusion モデルを含む拡散モデルがトレーニングデータからあからさまにコピーされているケースも特定しています。

Cutting-edge diffusion models produce images with high quality and customizability, enabling them to be used for commercial art and graphic design purposes. But do diffusion models create unique works of art, or are they stealing content directly from their training sets? In this work, we study image retrieval frameworks that enable us to compare generated images with training samples and detect when content has been replicated. Applying our frameworks to diffusion models trained on multiple datasets including Oxford flowers, Celeb-A, ImageNet, and LAION, we discuss how factors such as training set size impact rates of content replication. We also identify cases where diffusion models, including the popular Stable Diffusion model, blatantly copy from their training data.

updated: Thu Dec 08 2022 18:59:30 GMT+0000 (UTC)

published: Wed Dec 07 2022 18:58:02 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト