3D-Aware Encoding for Style-based Neural Radiance Fields

Yu-Jhe Li; Tao Xu; Bichen Wu; Ningyuan Zheng; Xiaoliang Dai; Albert Pumarola; Peizhao Zhang; Peter Vajda; Kris Kitani

スタイルベースのニューラルラディアンスフィールドの 3D 認識エンコーディング

スタイルベースのニューラル放射輝度フィールド (StyleNeRF など) の NeRF インバージョンのタスクに取り組みます。このタスクでは、入力画像を NeRF ジェネレーターの潜在空間に投影し、潜在コードに基づいて元の画像の新しいビューを合成する反転関数を学習することを目指しています。 2D 生成モデルの GAN 反転と比較して、NeRF 反転は、1) 入力画像の同一性を維持するだけでなく、2) 生成された新しいビューで 3D の一貫性を確保する必要があります。これには、単一ビューの画像から取得した潜在コードが複数のビューにわたって不変である必要があります。この新しい課題に対処するために、スタイルベースの NeRF 反転用の 2 ステージエンコーダーを提案します。最初の段階では、入力画像を潜在コードに変換するベースエンコーダーを導入します。潜在コードがビュー不変であり、3D の一貫した新しいビュー画像を合成できるようにするために、ID 対照学習を利用してベースエンコーダーをトレーニングします。次に、入力画像のアイデンティティをより適切に保持するために、潜在コードを改良し、出力画像により細かいディテールを追加するための改良エンコーダーを導入します。このモデルの目新しさは、潜在多様体にある最も近い潜在コードを生成する第 1 段階のエンコーダーの設計にあるため、第 2 段階の改良は NeRF 多様体に近いことに注意してください。広範な実験を通じて、提案された2ステージエンコーダーが、画像再構成と新規ビューレンダリングの両方で、既存のエンコーダーよりも定性的および定量的に優れていることを示します。

We tackle the task of NeRF inversion for style-based neural radiance fields, (e.g., StyleNeRF). In the task, we aim to learn an inversion function to project an input image to the latent space of a NeRF generator and then synthesize novel views of the original image based on the latent code. Compared with GAN inversion for 2D generative models, NeRF inversion not only needs to 1) preserve the identity of the input image, but also 2) ensure 3D consistency in generated novel views. This requires the latent code obtained from the single-view image to be invariant across multiple views. To address this new challenge, we propose a two-stage encoder for style-based NeRF inversion. In the first stage, we introduce a base encoder that converts the input image to a latent code. To ensure the latent code is view-invariant and is able to synthesize 3D consistent novel view images, we utilize identity contrastive learning to train the base encoder. Second, to better preserve the identity of the input image, we introduce a refining encoder to refine the latent code and add finer details to the output image. Importantly note that the novelty of this model lies in the design of its first-stage encoder which produces the closest latent code lying on the latent manifold and thus the refinement in the second stage would be close to the NeRF manifold. Through extensive experiments, we demonstrate that our proposed two-stage encoder qualitatively and quantitatively exhibits superiority over the existing encoders for inversion in both image reconstruction and novel-view rendering.

updated: Sat Nov 12 2022 06:14:12 GMT+0000 (UTC)

published: Sat Nov 12 2022 06:14:12 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト