Multi-Architecture Multi-Expert Diffusion Models

Yunsung Lee; Jin-Young Kim; Hyojun Go; Myeongho Jeong; Shinhyeok Oh; Seungtaek Choi

マルチアーキテクチャマルチエキスパートの普及モデル

拡散モデルは、複数ステップのノイズ除去プロセスを採用することで、多様で現実的なデータを生成するという素晴らしい結果を達成しました。ただし、各タイムステップでの入力ノイズの大幅な変動に対応する必要があるため、拡散モデルではデノイザーに多数のパラメーターが必要になります。拡散モデルが各タイムステップノイズで異なる周波数範囲のフィルターとして効果的に機能することを観察しました。これまでのいくつかの作品では、デノイザーを異なるノイズ間隔に割り当てるマルチエキスパート戦略を導入していましたが、高周波と低周波に対する特殊な操作の重要性を見落としていました。たとえば、セルフアテンション操作は低周波成分 (ローパスフィルター) の処理に効果的ですが、畳み込みは高周波の特徴 (ハイパスフィルター) の捕捉に優れています。言い換えれば、既存の拡散モデルは、各タイムステップノイズに対する最適な動作を考慮せずに、同じアーキテクチャのデノイザーを採用しています。この制限に対処するために、我々は、マルチアーキテクチャマルチエキスパート (MEME) と呼ばれる新しいアプローチを提案します。このアプローチは、各タイムステップ間隔で必要な操作に合わせて調整された特殊なアーキテクチャを持つ複数のエキスパートで構成されます。広範な実験を通じて、MEME が生成パフォーマンスと計算効率の両方の点で大規模な競合他社よりも優れていることを実証しました。

Diffusion models have achieved impressive results in generating diverse and realistic data by employing multi-step denoising processes. However, the need for accommodating significant variations in input noise at each time-step has led to diffusion models requiring a large number of parameters for their denoisers. We have observed that diffusion models effectively act as filters for different frequency ranges at each time-step noise. While some previous works have introduced multi-expert strategies, assigning denoisers to different noise intervals, they overlook the importance of specialized operations for high and low frequencies. For instance, self-attention operations are effective at handling low-frequency components (low-pass filters), while convolutions excel at capturing high-frequency features (high-pass filters). In other words, existing diffusion models employ denoisers with the same architecture, without considering the optimal operations for each time-step noise. To address this limitation, we propose a novel approach called Multi-architecturE Multi-Expert (MEME), which consists of multiple experts with specialized architectures tailored to the operations required at each time-step interval. Through extensive experiments, we demonstrate that MEME outperforms large competitors in terms of both generation performance and computational efficiency.

updated: Thu Jun 08 2023 07:24:08 GMT+0000 (UTC)

published: Thu Jun 08 2023 07:24:08 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト