Unsupervised Representation Learning from Pre-trained Diffusion Probabilistic Models

Zijian Zhang; Zhou Zhao; Zhijie Lin

事前訓練された拡散確率モデルから学習する教師なし表現

拡散確率モデル (DPM) は、高品質の画像サンプルを生成する強力な能力を示しています。最近、拡散オートエンコーダー (Diff-AE) が提案され、オートエンコードによる表現学習用の DPM を探索します。彼らの重要なアイデアは、画像から意味のある表現を発見するためのエンコーダーと、画像を再構築するためのデコーダーとしての条件付き DPM を共同でトレーニングすることです。 DPM をゼロからトレーニングするには長い時間がかかり、多数の事前トレーニング済み DPM が存在することを考慮して、既存の事前トレーニング済み DPM を画像再構成用のデコーダーに適応させる一般的な方法である Pre-trained DPM AutoEncoding (PDAE) を提案します。 Diff-AE よりも優れたトレーニング効率とパフォーマンス。具体的には、事前トレーニング済みの DPM が潜在変数から画像を再構築できない理由は、予測された事後平均と実際の事後平均との間にギャップが生じるフォワードプロセスの情報損失によるものであることがわかりました。この観点から、分類子ガイド付きサンプリング法は、ギャップを埋めるために余分な平均シフトを計算し、失われたクラス情報をサンプルで再構築するものとして説明できます。これらは、ギャップが画像の失われた情報に対応することを意味し、ギャップを埋めることで画像を再構築できます。これからインスピレーションを得て、トレーニング可能なモデルを使用して、エンコードされた表現に従って平均シフトを予測し、できるだけ多くのギャップを埋めるようにトレーニングします。このようにして、エンコーダーは画像からできるだけ多くの情報を学習して、充填。事前にトレーニングされた DPM のネットワークの一部を再利用し、拡散損失の重み付けスキームを再設計することにより、PDAE は画像から意味のある表現を効率的に学習できます。広範な実験により、PDAE の有効性、効率性、および柔軟性が実証されています。

Diffusion Probabilistic Models (DPMs) have shown a powerful capacity of generating high-quality image samples. Recently, diffusion autoencoders (Diff-AE) have been proposed to explore DPMs for representation learning via autoencoding. Their key idea is to jointly train an encoder for discovering meaningful representations from images and a conditional DPM as the decoder for reconstructing images. Considering that training DPMs from scratch will take a long time and there have existed numerous pre-trained DPMs, we propose Pre-trained DPM AutoEncoding (PDAE), a general method to adapt existing pre-trained DPMs to the decoders for image reconstruction, with better training efficiency and performance than Diff-AE. Specifically, we find that the reason that pre-trained DPMs fail to reconstruct an image from its latent variables is due to the information loss of forward process, which causes a gap between their predicted posterior mean and the true one. From this perspective, the classifier-guided sampling method can be explained as computing an extra mean shift to fill the gap, reconstructing the lost class information in samples. These imply that the gap corresponds to the lost information of the image, and we can reconstruct the image by filling the gap. Drawing inspiration from this, we employ a trainable model to predict a mean shift according to encoded representation and train it to fill as much gap as possible, in this way, the encoder is forced to learn as much information as possible from images to help the filling. By reusing a part of network of pre-trained DPMs and redesigning the weighting scheme of diffusion loss, PDAE can learn meaningful representations from images efficiently. Extensive experiments demonstrate the effectiveness, efficiency and flexibility of PDAE.

updated: Wed Mar 01 2023 04:35:39 GMT+0000 (UTC)

published: Mon Dec 26 2022 02:37:38 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト