VQ3D: Learning a 3D-Aware Generative Model on ImageNet

Kyle Sargent; Jing Yu Koh; Han Zhang; Huiwen Chang; Charles Herrmann; Pratul Srinivasan; Jiajun Wu; Deqing Sun

VQ3D: ImageNet で 3D 対応の生成モデルを学習する

最近の研究では、人間の顔、動物の顔、車などの単一のオブジェクトクラスに対応する小さなデータセットで、2D 画像コレクションから 3D コンテンツの生成モデルをトレーニングできる可能性が示されています。ただし、これらのモデルは、より大規模で複雑なデータセットでは苦労します。 ImageNet などの多様で制約のない画像コレクションをモデル化するために、NeRF ベースのデコーダーを 2 段階のベクトル量子化オートエンコーダーに導入する VQ3D を提示します。ステージ 1 では、入力画像の再構成と画像周辺のカメラ位置の変更が可能になり、ステージ 2 では新しい 3D シーンの生成が可能になります。 VQ3D は、120 万のトレーニング画像からなる 1000 クラスの ImageNet データセットから 3D 対応の画像を生成および再構築できます。次に最適なベースラインメソッドの 69.8 と比較して、ImageNet 生成の FID スコアは 16.8 です。

Recent work has shown the possibility of training generative models of 3D content from 2D image collections on small datasets corresponding to a single object class, such as human faces, animal faces, or cars. However, these models struggle on larger, more complex datasets. To model diverse and unconstrained image collections such as ImageNet, we present VQ3D, which introduces a NeRF-based decoder into a two-stage vector-quantized autoencoder. Our Stage 1 allows for the reconstruction of an input image and the ability to change the camera position around the image, and our Stage 2 allows for the generation of new 3D scenes. VQ3D is capable of generating and reconstructing 3D-aware images from the 1000-class ImageNet dataset of 1.2 million training images. We achieve an ImageNet generation FID score of 16.8, compared to 69.8 for the next best baseline method.

updated: Tue Feb 14 2023 05:15:16 GMT+0000 (UTC)

published: Tue Feb 14 2023 05:15:16 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト