DiffFit: Unlocking Transferability of Large Diffusion Models via Simple Parameter-Efficient Fine-Tuning

Enze Xie; Lewei Yao; Han Shi; Zhili Liu; Daquan Zhou; Zhaoqiang Liu; Jiawei Li; Zhenguo Li

DiffFit: シンプルなパラメーター効率の良い微調整による大拡散モデルの転送可能性のロック解除

拡散モデルは、高品質の画像を生成するのに非常に効果的であることが証明されています。ただし、事前にトレーニングされた大規模な拡散モデルを新しいドメインに適応させることは未解決の課題であり、現実世界のアプリケーションにとって重要です。このホワイトペーパーでは、新しいドメインへの迅速な適応を可能にする事前トレーニング済みの大規模な拡散モデルを微調整するためのパラメーター効率の高い戦略である DiffFit を提案します。 DiffFit は、特定のレイヤーでバイアス項と新しく追加されたスケーリング係数を微調整するだけで、非常にシンプルですが、トレーニングの大幅な高速化とモデルストレージコストの削減を実現します。完全な微調整と比較して、DiffFit は 2 倍のトレーニング速度の向上を実現し、モデルパラメーター全体の約 0.12% を保存するだけで済みます。素早い適応に対するスケーリング係数の有効性を正当化するために、直感的な理論的分析が提供されています。 8 つのダウンストリームデータセットで、DiffFit は、より効率的でありながら、完全な微調整と比較して優れた、または競争力のあるパフォーマンスを達成します。驚くべきことに、DiffFit は、最小限のコストを追加するだけで、事前にトレーニングされた低解像度の生成モデルを高解像度の生成モデルに適応させることができることを示しています。拡散ベースの方法の中で、DiffFit は ImageNet 512×512 ベンチマークで 3.02 という新しい最先端の FID を設定します最も近い競合他社よりも効率的です。

Diffusion models have proven to be highly effective in generating high-quality images. However, adapting large pre-trained diffusion models to new domains remains an open challenge, which is critical for real-world applications. This paper proposes DiffFit, a parameter-efficient strategy to fine-tune large pre-trained diffusion models that enable fast adaptation to new domains. DiffFit is embarrassingly simple that only fine-tunes the bias term and newly-added scaling factors in specific layers, yet resulting in significant training speed-up and reduced model storage costs. Compared with full fine-tuning, DiffFit achieves 2× training speed-up and only needs to store approximately 0.12% of the total model parameters. Intuitive theoretical analysis has been provided to justify the efficacy of scaling factors on fast adaptation. On 8 downstream datasets, DiffFit achieves superior or competitive performances compared to the full fine-tuning while being more efficient. Remarkably, we show that DiffFit can adapt a pre-trained low-resolution generative model to a high-resolution one by adding minimal cost. Among diffusion-based methods, DiffFit sets a new state-of-the-art FID of 3.02 on ImageNet 512×512 benchmark by fine-tuning only 25 epochs from a public pre-trained ImageNet 256×256 checkpoint while being 30× more training efficient than the closest competitor.

updated: Tue Apr 25 2023 11:57:21 GMT+0000 (UTC)

published: Thu Apr 13 2023 16:17:50 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト