BRIGHT: Bi-level Feature Representation of Image Collections using Groups of Hash Tables

Dingdong Yang; Yizhi Wang; Ali Mahdavi-Amiri; Hao Zhang

BRIGHT: ハッシュテーブルのグループを使用した画像コレクションの 2 レベル特徴表現

我々は、マルチスケール特徴グリッド空間上の画像ごとの潜在空間で構成される、画像コレクションの 2 レベル特徴表現である BRIGHT を提示します。私たちの表現は、オートエンコーダーによって学習され、画像を連続キーコードにエンコードします。これは、多重解像度ハッシュテーブルのグループから特徴を取得するために使用されます。当社のキーコードとハッシュテーブルは、明確に定義された勾配フローを使用して継続的にトレーニングされるため、離散ベクトル量子化 (VQ) と比較して、ハッシュテーブルエントリの使用率が高く、生成モデリングが向上します。 KL 正規化された潜在コードなどの既存の連続表現とは異なり、キーコードはスケールと分散が厳密に制限されています。全体として、BRIGHT による特徴エンコードはコンパクトでトレーニング効率が高く、潜在拡散モデル (LDM) などの最先端のジェネレーターを使用して画像コードの生成モデリングを可能にします。実験結果は、私たちの方法がより小型でより効率的なデコーダネットワークを持ちながら、VQ方法と同等の再構成結果を達成することを示しています。キーコード空間に LDM を適用することで、LSUN-Church および人間の顔のデータセットでの画像合成で最先端のパフォーマンスを実現します。

We present BRIGHT, a bi-levelfeature representation for an imagecollection, consisting of a per-image latent space on top of a multi-scale feature grid space. Our representation is learned by an autoencoder to encode images intocontinuouskey codes, which are used to retrieve features fromgroups of multi-resolution hashtables. Our key codes and hash tables are trained together continuously with well-defined gradient flows, leading to high usage of the hash table entries and improved generative modeling compared to discrete Vector Quantization (VQ). Differently from existing continuous representations such as KL-regularized latent codes, our key codes are strictly bounded in scale and variance. Overall, feature encoding by BRIGHT is compact, efficient to train, and enables generative modeling over the image codes using state-of-the-art generators such as latent diffusion models(LDMs). Experimental results show that our method achieves comparable recon-struction results to VQ methods while having a smaller and more efficient decoder network. By applying LDM over our key code space, we achieve state-of-the-art performance on image synthesis on the LSUN-Church and human-face datasets.

updated: Mon May 29 2023 20:34:40 GMT+0000 (UTC)

published: Mon May 29 2023 20:34:40 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト