Fast Training of Diffusion Models with Masked Transformers

Hongkai Zheng; Weili Nie; Arash Vahdat; Anima Anandkumar

マスクされたトランスフォーマーを使用した拡散モデルの高速トレーニング

マスクされたトランスを使用して大規模な拡散モデルをトレーニングするための効率的なアプローチを提案します。マスクされた変換器は表現学習のために広く研究されてきましたが、生成学習への応用は視覚領域ではあまり研究されていません。私たちの研究は、マスクされたトレーニングを利用して拡散モデルのトレーニングコストを大幅に削減した最初の研究です。具体的には、トレーニング中に拡散入力画像内の高い割合 (たとえば 50%) のパッチをランダムにマスクアウトします。マスクされたトレーニングの場合、マスクされていないパッチでのみ動作するトランスフォーマーエンコーダーと完全なパッチで動作する軽量のトランスフォーマーデコーダーで構成される非対称エンコーダー/デコーダーアーキテクチャを導入します。完全なパッチの長期的な理解を促進するために、マスクされていないパッチのスコアを学習するノイズ除去スコアマッチング目標に、マスクされたパッチを再構築する補助タスクを追加します。 ImageNet-256×256 での実験では、私たちのアプローチが元のトレーニング時間の 31% のみを使用して、最先端の拡散変換器 (DiT) モデルと同じパフォーマンスを達成できることが示されています。したがって、私たちの方法では、生成パフォーマンスを犠牲にすることなく、拡散モデルの効率的なトレーニングが可能になります。

We propose an efficient approach to train large diffusion models with masked transformers. While masked transformers have been extensively explored for representation learning, their application to generative learning is less explored in the vision domain. Our work is the first to exploit masked training to reduce the training cost of diffusion models significantly. Specifically, we randomly mask out a high proportion (e.g., 50%) of patches in diffused input images during training. For masked training, we introduce an asymmetric encoder-decoder architecture consisting of a transformer encoder that operates only on unmasked patches and a lightweight transformer decoder on full patches. To promote a long-range understanding of full patches, we add an auxiliary task of reconstructing masked patches to the denoising score matching objective that learns the score of unmasked patches. Experiments on ImageNet-256×256 show that our approach achieves the same performance as the state-of-the-art Diffusion Transformer (DiT) model, using only 31% of its original training time. Thus, our method allows for efficient training of diffusion models without sacrificing the generative performance.

updated: Thu Jun 15 2023 17:38:48 GMT+0000 (UTC)

published: Thu Jun 15 2023 17:38:48 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト