Unsupervised Novel View Synthesis from a Single Image

Pierluigi Zama Ramirez; Diego Martin Arroyo; Alessio Tonioni; Federico Tombari

単一画像からの教師なし新規ビュー合成

トレーニング時に何らかの形の3D、ポーズ、またはマルチビューの監視が必要なため、実際のシナリオでの展開が制限されますが、単一の画像からの新しいビューの合成は最近注目に値する結果を達成しました。この作業は、これらの仮定を緩和して、完全に教師なしの方法で新しいビュー合成のための条件付き生成モデルのトレーニングを可能にすることを目的としています。最初に、3D対応のGAN公式を使用して純粋に生成的なデコーダーモデルを事前トレーニングすると同時に、エンコーダーネットワークをトレーニングして、潜在空間から画像へのマッピングを反転します。次に、エンコーダーとデコーダーを交換し、オートエンコーダーのような目的と自己蒸留を組み合わせた条件付きGANとしてネットワークをトレーニングします。テスト時に、オブジェクトのビューが与えられると、モデルは最初に画像コンテンツを潜在コードに埋め込み、そのポーズを回帰し、次にコードを固定してポーズを変更することにより、オブジェクトの新しいビューを生成します。フレームワークは、ShapeNetなどの合成データセットと、競合するメソッドをトレーニングできない制約のない自然画像のコレクションの両方でテストします。

Novel view synthesis from a single image has recently achieved remarkable results, although the requirement of some form of 3D, pose, or multi-view supervision at training time limits the deployment in real scenarios. This work aims at relaxing these assumptions enabling training of conditional generative models for novel view synthesis in a completely unsupervised manner. We first pre-train a purely generative decoder model using a 3D-aware GAN formulation while at the same time train an encoder network to invert the mapping from latent space to images. Then, we swap encoder and decoder and train the network as a conditioned GAN with a mixture of an autoencoder-like objective and self-distillation. At test time, given a view of an object, our model first embeds the image content in a latent code and regresses its pose, then generates novel views of it by keeping the code fixed and varying the pose. We test our framework on both synthetic datasets such as ShapeNet and on unconstrained collections of natural images, where no competing methods can be trained.

updated: Wed Dec 15 2021 12:19:58 GMT+0000 (UTC)

published: Fri Feb 05 2021 16:56:04 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト