Image GANs meet Differentiable Rendering for Inverse Graphics and Interpretable 3D Neural Rendering

Yuxuan Zhang; Wenzheng Chen; Huan Ling; Jun Gao; Yinan Zhang; Antonio Torralba; Sanja Fidler

画像GANは、逆グラフィックスと解釈可能な3Dニューラルレンダリングの微分可能レンダリングに対応します

微分可能なレンダリングにより、単眼写真から3Dジオメトリを予測するなど、「逆グラフィックス」タスクを実行するようにニューラルネットワークをトレーニングする道が開かれました。高性能モデルをトレーニングするために、現在のアプローチのほとんどは、実際には容易に利用できないマルチビュー画像に依存しています。対照的に、画像を合成する最近の生成的敵対的ネットワーク（GAN）は、トレーニング中に暗黙的に3D知識を取得するようです。オブジェクトの視点は、潜在的なコードを操作するだけで操作できます。ただし、これらの潜在的なコードには、それ以上の物理的解釈が欠けていることが多いため、GANを簡単に反転して明示的な3D推論を実行することはできません。この論文では、微分可能なレンダラーを利用して、生成モデルによって学習された3D知識を抽出して解きほぐすことを目指しています。私たちのアプローチの鍵は、GANをマルチビューデータジェネレーターとして活用して、既製の微分可能なレンダラーを使用して逆グラフィックスネットワークをトレーニングし、トレーニングされた逆グラフィックスネットワークを教師として活用して、GANの潜在的なコードを解釈可能な3Dプロパティに解きほぐすことです。。アーキテクチャ全体は、サイクルの一貫性の損失を使用して繰り返しトレーニングされます。私たちのアプローチは、定量的にもユーザー調査によっても、既存のデータセットでトレーニングされた最先端のインバースグラフィックスネットワークを大幅に上回っていることを示しています。さらに、解きほぐされたGANを、従来のグラフィックスレンダラーを補完する制御可能な3D「ニューラルレンダラー」として紹介します。

Differentiable rendering has paved the way to training neural networks to perform "inverse graphics" tasks such as predicting 3D geometry from monocular photographs. To train high performing models, most of the current approaches rely on multi-view imagery which are not readily available in practice. Recent Generative Adversarial Networks (GANs) that synthesize images, in contrast, seem to acquire 3D knowledge implicitly during training: object viewpoints can be manipulated by simply manipulating the latent codes. However, these latent codes often lack further physical interpretation and thus GANs cannot easily be inverted to perform explicit 3D reasoning. In this paper, we aim to extract and disentangle 3D knowledge learned by generative models by utilizing differentiable renderers. Key to our approach is to exploit GANs as a multi-view data generator to train an inverse graphics network using an off-the-shelf differentiable renderer, and the trained inverse graphics network as a teacher to disentangle the GAN's latent code into interpretable 3D properties. The entire architecture is trained iteratively using cycle consistency losses. We show that our approach significantly outperforms state-of-the-art inverse graphics networks trained on existing datasets, both quantitatively and via user studies. We further showcase the disentangled GAN as a controllable 3D "neural renderer", complementing traditional graphics renderers.

updated: Tue Apr 20 2021 18:06:17 GMT+0000 (UTC)

published: Sun Oct 18 2020 22:29:07 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト