Masked Diffusion Models Are Fast and Privacy-Aware Learners

Jiachen Lei; Peng Cheng; Zhongjie Ba; Kui Ren

マスクされた拡散モデルは高速でプライバシーを意識した学習者です

拡散モデルは、画像生成の事実上の技術として登場しましたが、かなりの計算オーバーヘッドを伴うため、研究コミュニティにおけるこの技術の広範な応用の妨げとなっています。我々は事前ベースのノイズ除去トレーニングフレームワークを提案します。これは事前トレーニングと微調整パラダイムを拡散モデルのトレーニングプロセスに初めて組み込んだもので、これによりトレーニング効率が大幅に向上し、さまざまな下流タスクを容易にする可能性が示されます。私たちのアプローチは、入力画像の大部分 (たとえば、最大 90%) をマスクし、マスクされたノイズ除去スコアマッチングを採用して可視領域のノイズを除去することに重点を置いています。これにより、拡散モデルが事前知識としてトレーニングデータからより顕著な特徴を学習するように導きます。事前トレーニング段階でマスク学習を利用することで、ピクセル空間の CelebA-HQ 256 ×256 上で ViT ベースの拡散モデルを効率的にトレーニングし、ノイズ除去拡散確率モデルと比較して 4 倍の高速化を達成し、生成される画像の品質を向上させます ( DDPM）。さらに、私たちのマスクされた事前トレーニング技術は、ピクセル空間に画像を直接生成するさまざまな拡散モデルに汎用的に適用でき、優れた一般化性を備えた事前トレーニングされたモデルの学習を支援します。たとえば、VGGFace2 で事前トレーニングされた拡散モデルは、異なる分布からのわずか 10% のデータを微調整することで 46% の品質向上を達成します。さらに、私たちの方法は、拡散モデルのプライバシー保護機能を強化するためのトレーニングパラダイムとして機能する可能性を示しています。私たちのコードは https://github.com/jiachenlei/maskdm で入手できます。

Diffusion models have emerged as the de-facto technique for image generation, yet they entail significant computational overhead, hindering the technique's broader application in the research community. We propose a prior-based denoising training framework, the first to incorporate the pre-train and fine-tune paradigm into the diffusion model training process, which substantially improves training efficiency and shows potential in facilitating various downstream tasks. Our approach centers on masking a high proportion (e.g., up to 90%) of the input image and employing masked denoising score matching to denoise the visible areas, thereby guiding the diffusion model to learn more salient features from training data as prior knowledge. By utilizing masked learning in a pre-training stage, we efficiently train the ViT-based diffusion model on CelebA-HQ 256 ×256 in the pixel space, achieving a 4x acceleration and enhancing the quality of generated images compared to denoising diffusion probabilistic model (DDPM). Moreover, our masked pre-training technique can be universally applied to various diffusion models that directly generate images in the pixel space, aiding in the learning of pre-trained models with superior generalizability. For instance, a diffusion model pre-trained on VGGFace2 attains a 46% quality improvement through fine-tuning with merely 10% data from a different distribution. Moreover, our method shows the potential to serve as a training paradigm for enhancing the privacy protection capabilities of diffusion models. Our code is available at https://github.com/jiachenlei/maskdm.

updated: Thu Aug 03 2023 16:55:34 GMT+0000 (UTC)

published: Tue Jun 20 2023 08:02:59 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト