AR-NeRF: Unsupervised Learning of Depth and Defocus Effects from Natural Images with Aperture Rendering Neural Radiance Fields

Takuhiro Kaneko

AR-NeRF：アパーチャレンダリングニューラルラディアンスフィールドを使用した自然画像からの深度および焦点ぼけ効果の教師なし学習

完全に教師なしの3D表現学習は、データ収集におけるその利点のために注目を集めています。成功するアプローチには、3D認識モデル（たとえば、神経放射フィールド（NeRF））に基づいてさまざまなビュー画像を生成しながら、生成モデル（たとえば、生成敵対的ネットワーク（GAN））に基づいて画像分布を学習する視点認識アプローチが含まれます。ただし、トレーニングにはさまざまなビューの画像が必要であるため、視点がほとんどないか限られているデータセットへの適用は依然として課題です。補完的なアプローチとして、デフォーカスキューを採用したアパーチャレンダリングGAN（AR-GAN）が提案されました。ただし、AR-GANはCNNベースのモデルであり、相関性が高いにもかかわらず、視点の変化とは無関係に焦点ぼけを表します。これがパフォーマンスの理由の1つです。 AR-GANの代わりに、共通のレイトレーシングフレームワークで両方の要素を表現することにより、視点と焦点ぼけの手がかりを統一的に利用できるアパーチャレンダリングNeRF（AR-NeRF）を提案します。さらに、デフォーカス認識およびデフォーカスに依存しない表現を解きほぐして学習するために、アパーチャサイズと潜在コードを独立してランダム化しながら画像を生成することを学習するアパーチャランダム化トレーニングを提案します。実験中、AR-NeRFを花、鳥、顔の画像などのさまざまな自然画像データセットに適用しました。その結果は、深度と焦点ぼけの影響を教師なし学習するためのAR-NeRFの有用性を示しています。

Fully unsupervised 3D representation learning has gained attention owing to its advantages in data collection. A successful approach involves a viewpoint-aware approach that learns an image distribution based on generative models (e.g., generative adversarial networks (GANs)) while generating various view images based on 3D-aware models (e.g., neural radiance fields (NeRFs)). However, they require images with various views for training, and consequently, their application to datasets with few or limited viewpoints remains a challenge. As a complementary approach, an aperture rendering GAN (AR-GAN) that employs a defocus cue was proposed. However, an AR-GAN is a CNN-based model and represents a defocus independently from a viewpoint change despite its high correlation, which is one of the reasons for its performance. As an alternative to an AR-GAN, we propose an aperture rendering NeRF (AR-NeRF), which can utilize viewpoint and defocus cues in a unified manner by representing both factors in a common ray-tracing framework. Moreover, to learn defocus-aware and defocus-independent representations in a disentangled manner, we propose aperture randomized training, for which we learn to generate images while randomizing the aperture size and latent codes independently. During our experiments, we applied AR-NeRF to various natural image datasets, including flower, bird, and face images, the results of which demonstrate the utility of AR-NeRF for unsupervised learning of the depth and defocus effects.

updated: Mon Jun 13 2022 12:41:59 GMT+0000 (UTC)

published: Mon Jun 13 2022 12:41:59 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト