Flow Matching in Latent Space

Quan Dao; Hao Phung; Binh Nguyen; Anh Tran

潜在空間におけるフローマッチング

フローマッチングは、生成モデルをトレーニングするための最近のフレームワークであり、拡散ベースのモデルと比較してトレーニングが比較的容易でありながら、印象的な経験的パフォーマンスを示します。その有利な特性にもかかわらず、従来の方法は依然として、高価な計算と、ピクセル空間における既製のソルバーの多数の関数評価という課題に直面している。さらに、潜在ベースの生成手法は近年大きな成功を収めていますが、この特定のモデルタイプはこの分野ではまだ研究されていません。この研究では、事前トレーニングされたオートエンコーダの潜在空間にフローマッチングを適用することを提案します。これにより、高解像度画像合成の計算効率とスケーラビリティが向上します。これにより、品質と柔軟性を維持しながら、制約のある計算リソースでのフローマッチングトレーニングが可能になります。さらに、私たちの研究は、ラベル条件付き画像生成、画像修復、セマンティックから画像への生成など、条件付き生成タスクのフローマッチングにさまざまな条件を統合する先駆的な貢献となります。広範な実験を通じて、私たちのアプローチは、CelebA-HQ、FFHQ、LSUN Church & Bedroom、ImageNet などのさまざまなデータセットの定量的および定性的結果の両方でその有効性を実証しています。また、再構成された潜在フロー分布と真のデータ分布の間の Wasserstein-2 距離の理論的制御も提供し、それが潜在フローマッチング目標によって上限があることを示します。私たちのコードは https://github.com/VinAIResearch/LFM.git で入手できます。

Flow matching is a recent framework to train generative models that exhibits impressive empirical performance while being relatively easier to train compared with diffusion-based models. Despite its advantageous properties, prior methods still face the challenges of expensive computing and a large number of function evaluations of off-the-shelf solvers in the pixel space. Furthermore, although latent-based generative methods have shown great success in recent years, this particular model type remains underexplored in this area. In this work, we propose to apply flow matching in the latent spaces of pretrained autoencoders, which offers improved computational efficiency and scalability for high-resolution image synthesis. This enables flow-matching training on constrained computational resources while maintaining their quality and flexibility. Additionally, our work stands as a pioneering contribution in the integration of various conditions into flow matching for conditional generation tasks, including label-conditioned image generation, image inpainting, and semantic-to-image generation. Through extensive experiments, our approach demonstrates its effectiveness in both quantitative and qualitative results on various datasets, such as CelebA-HQ, FFHQ, LSUN Church & Bedroom, and ImageNet. We also provide a theoretical control of the Wasserstein-2 distance between the reconstructed latent flow distribution and true data distribution, showing it is upper-bounded by the latent flow matching objective. Our code will be available at https://github.com/VinAIResearch/LFM.git.

updated: Mon Jul 17 2023 17:57:56 GMT+0000 (UTC)

published: Mon Jul 17 2023 17:57:56 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト