StylizedNeRF: Consistent 3D Scene Stylization as Stylized NeRF via 2D-3D Mutual Learning

Yi-Hua Huang; Yue He; Yu-Jie Yuan; Yu-Kun Lai; Lin Gao

StylizedNeRF：2D-3D相互学習による定型化されたNeRFとしての一貫した3Dシーンの定型化

3Dシーンの定型化は、さまざまなビューからレンダリングされたときに一貫性を確保しながら、特定のスタイル例のセットに従って任意の新しいビューからシーンの定型化された画像を生成することを目的としています。画像またはビデオのスタイル設定の方法を3Dシーンに直接適用しても、このような一貫性を実現することはできません。最近提案された神経放射輝度フィールド（NeRF）のおかげで、一貫した方法で3Dシーンを表現することができます。一貫性のある3Dシーンのスタイル設定は、対応するNeRFをスタイル設定することで効果的に実現できます。ただし、2D画像であるスタイルの例と暗黙のボリューム表現であるNeRFの間には、大きなドメインギャップがあります。この問題に対処するために、2D画像様式化ネットワークとNeRFを組み合わせて、2D様式化ネットワークの様式化能力とNeRFの3D一貫性を融合する、3Dシーン様式化のための新しい相互学習フレームワークを提案します。まず、定型化する3Dシーンの標準NeRFを事前トレーニングし、その色予測モジュールをスタイルネットワークに置き換えて、定型化されたNeRFを取得します。その後、NeRFから2D定型化ネットワークへの空間的一貫性に関する事前知識を抽出します。導入された一貫性の損失。また、NeRFスタイルモジュールの相互学習を監視し、2Dスタイル設定デコーダーを微調整するための模倣損失を導入します。モデルが2D様式化の結果のあいまいさをさらに処理できるようにするために、スタイルに条件付けられた確率分布に従う学習可能な潜在コードを導入します。これらは、条件付き入力としてトレーニングサンプルに添付され、新しい定型化されたNeRFのスタイルモジュールをよりよく学習します。実験結果は、私たちの方法が視覚的品質と長距離一貫性の両方で既存のアプローチよりも優れていることを示しています。

3D scene stylization aims at generating stylized images of the scene from arbitrary novel views following a given set of style examples, while ensuring consistency when rendered from different views. Directly applying methods for image or video stylization to 3D scenes cannot achieve such consistency. Thanks to recently proposed neural radiance fields (NeRF), we are able to represent a 3D scene in a consistent way. Consistent 3D scene stylization can be effectively achieved by stylizing the corresponding NeRF. However, there is a significant domain gap between style examples which are 2D images and NeRF which is an implicit volumetric representation. To address this problem, we propose a novel mutual learning framework for 3D scene stylization that combines a 2D image stylization network and NeRF to fuse the stylization ability of 2D stylization network with the 3D consistency of NeRF. We first pre-train a standard NeRF of the 3D scene to be stylized and replace its color prediction module with a style network to obtain a stylized NeRF.It is followed by distilling the prior knowledge of spatial consistency from NeRF to the 2D stylization network through an introduced consistency loss. We also introduce a mimic loss to supervise the mutual learning of the NeRF style module and fine-tune the 2D stylization decoder. In order to further make our model handle ambiguities of 2D stylization results, we introduce learnable latent codes that obey the probability distributions conditioned on the style. They are attached to training samples as conditional inputs to better learn the style module in our novel stylized NeRF. Experimental results demonstrate that our method is superior to existing approaches in both visual quality and long-range consistency.

updated: Tue May 24 2022 16:29:50 GMT+0000 (UTC)

published: Tue May 24 2022 16:29:50 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト