Sampling From Autoencoders' Latent Space via Quantization And Probability Mass Function Concepts

Aymene Mohammed Bouayed; Adrian Iaccovelli; David Naccache

量子化と確率質量関数の概念によるオートエンコーダーの潜在空間からのサンプリング

この研究では、再構築されたサンプルが本物のような画像になるように、オートエンコーダー上に構築された生成モデルの潜在空間からのサンプリングに焦点を当てています。そのために、量子化プロセスと組み合わせた、確率質量関数の概念に根ざした新しいポストトレーニングサンプリングアルゴリズムを導入します。私たちが提案するアルゴリズムは、入力データから各潜在ベクトルの周囲に近傍を確立し、これらの定義された近傍からサンプルを抽出します。この戦略的アプローチにより、サンプリングされた潜在ベクトルが主に高確率領域に存在することが保証され、その結果、本物の現実世界の画像に効果的に変換できます。サンプリングアルゴリズムの比較で注目すべき点は、クラスターを表現する固有の機能により、混合ガウスモデル (GMM) に基づくサンプリング手法です。驚くべきことに、GMM サンプリングに関連する時間の複雑さを以前の O(n×d ×k ×i) から、より合理化された O(n×d) に改善することができ、その結果、実行時の速度が大幅に向上しました。さらに、画像生成用のフレシェ開始距離 (FID) を通じて測定された実験結果は、さまざまなモデルやデータセットにわたるサンプリングアルゴリズムの優れたパフォーマンスを強調しています。 MNIST ベンチマークデータセットでは、私たちのアプローチは FID 値で最大 0.89 という注目に値する改善をもたらし、GMM サンプリングを上回りました。さらに、顔と目の画像の生成に関しては、CelebA と MOBIUS データセットで証明されているように、私たちのアプローチは、GMM サンプリングと比較して、FID がそれぞれ 1.69 と 0.87 改善され、大幅な機能強化を示しています。最後に、GMM サンプリングとは対照的に、潜在空間分布を推定する際の方法論の有効性を、特にワッサーシュタイン距離のレンズを通して実証します。

In this study, we focus on sampling from the latent space of generative models built upon autoencoders so as the reconstructed samples are lifelike images. To do to, we introduce a novel post-training sampling algorithm rooted in the concept of probability mass functions, coupled with a quantization process. Our proposed algorithm establishes a vicinity around each latent vector from the input data and then proceeds to draw samples from these defined neighborhoods. This strategic approach ensures that the sampled latent vectors predominantly inhabit high-probability regions, which, in turn, can be effectively transformed into authentic real-world images. A noteworthy point of comparison for our sampling algorithm is the sampling technique based on Gaussian mixture models (GMM), owing to its inherent capability to represent clusters. Remarkably, we manage to improve the time complexity from the previous O(n×d ×k ×i) associated with GMM sampling to a much more streamlined O(n×d), thereby resulting in substantial speedup during runtime. Moreover, our experimental results, gauged through the Fréchet inception distance (FID) for image generation, underscore the superior performance of our sampling algorithm across a diverse range of models and datasets. On the MNIST benchmark dataset, our approach outperforms GMM sampling by yielding a noteworthy improvement of up to 0.89 in FID value. Furthermore, when it comes to generating images of faces and ocular images, our approach showcases substantial enhancements with FID improvements of 1.69 and 0.87 respectively, as compared to GMM sampling, as evidenced on the CelebA and MOBIUS datasets. Lastly, we substantiate our methodology's efficacy in estimating latent space distributions in contrast to GMM sampling, particularly through the lens of the Wasserstein distance.

updated: Mon Aug 21 2023 13:18:12 GMT+0000 (UTC)

published: Mon Aug 21 2023 13:18:12 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト